Dr. Mike Evans — Chief Executive A unique targeted sequencing service providing meaningful results, not insurmountable data
Outline of presentation <ul><li>Delivering a unique next generation sequencing service  —  Dr Mike Evans, CEO </li></ul><u...
OGT - provides advanced clinical genetics solutions    - develops innovative molecular diagnostics <ul><li>Founded by Ed S...
OGT’s key businesses IP Licensing 40 licence relationships Technologies For Molecular Medicine Clinical and Genomic Soluti...
Clinical and Genomic Solutions <ul><li>Addressing the challenges of high-throughput, high-resolution molecular technologie...
Genefficiency™ — World’s Leading aCGH Service <ul><li>High-quality data & complete reassurance </li></ul><ul><ul><li>Exper...
Independent Accreditations <ul><li>First Agilent High-Throughput Microarray Certified Service Provider </li></ul><ul><li>I...
Customer Satisfaction… 20,000 samples.  1,000 samples / week “ In order to characterise genetic variants, reproducible per...
OGT Collaborators and Customers
A World-class Team <ul><li>Our expert team deliver: </li></ul><ul><li>Excellent project management and customer service </...
Delivering Discovery <ul><li>Genefficiency Targeted Sequencing Services — designed to be different: </li></ul><ul><li>Comp...
Delivering an Integrated, Comprehensive Service  30/06/11 1. Selection of most appropriate genomic regions for enrichment ...
Delivering Expert Project Design <ul><li>Step 1:  Selection of most appropriate genomic regions for your project  and budg...
Delivering Class-leading Technology <ul><li>We have fully optimised the DNA capture and sequencing methodologies, so you d...
OGT Delivers Discovery, not just Data <ul><li>Step 3:  Data analysis and advanced filtering of variants  </li></ul><ul><li...
OGT Genefficiency Targeted Sequencing Services <ul><li>The PLATFORM </li></ul><ul><ul><li>Core sequencing platform: Illumi...
Outline of presentation <ul><li>Delivering a unique next generation sequencing service  —  Dr Mike Evans, CEO </li></ul><u...
Agenda <ul><li>Important Definitions and Terminologies </li></ul><ul><li>Introduction to Targeted Enrichment </li></ul><ul...
Definitions and Terminologies <ul><li>Read length – The number of bases sequenced in a fragment </li></ul><ul><li>Capture ...
Read Depth Will Vary Across a Region of Interest *Sequence Depth >20x: ~82% for Single End How many times has a base been ...
Read Depth Will Vary Across a Region of Interest *Sequence Depth >20x: ~82% for Single End ~90% for Paired End How many ti...
Read Depth Required for Mutation Detection <ul><li>Assuming no allelic bias the theoretical read depth required to detect ...
Agenda <ul><li>Important Definitions and Terminologies </li></ul><ul><li>Introduction to Targeted Enrichment </li></ul><ul...
Why use Targeted Enrichment?  <ul><li>Flexibility in choice of genomic loci </li></ul><ul><ul><li>Allows  capture of speci...
Targeted Approaches Introduce Bias There are significant imbalances in the sequence coverage achieved, particularly with t...
Example of Design Bias  - Insufficient Coverage Targeted gene sequencing can lead to some targets without the required dep...
Solution: Intelligent Design to Improve Coverage:  <ul><li>Option 1: </li></ul><ul><li>Increase coverage by increasing dep...
Agenda <ul><li>Important Definitions and Terminologies </li></ul><ul><li>Introduction to Targeted Enrichment </li></ul><ul...
Problems Facing Users <ul><ul><li>Design tools not user friendly </li></ul></ul><ul><ul><li>Design tools only good for dra...
OGT’s Design Pipeline – what we need from you: <ul><li>Regions of Interest </li></ul><ul><ul><li>Gene lists </li></ul></ul...
<ul><li>Assess the output: </li></ul><ul><ul><li>Coverage </li></ul></ul><ul><ul><li>Bait distribution </li></ul></ul><ul>...
<ul><li>Assess the output: </li></ul><ul><ul><li>Coverage </li></ul></ul><ul><ul><li>Bait distribution </li></ul></ul><ul>...
<ul><li>This ensures that small regions are captured as well as large regions </li></ul><ul><li>Advantage - Improves evenn...
<ul><li>GC content  </li></ul><ul><li>Calculate GC content for all baits </li></ul><ul><li>Identify those baits where GC c...
Customer Report <ul><li>Design Parameters </li></ul><ul><li>Depth of Coverage </li></ul><ul><ul><li>On target / Off target...
<ul><ul><li>Better ‘evenness’ of coverage helps ensure no regions are  missed and  maximises the likelihood of variant det...
Summary <ul><li>Custom design of regions for targeted sequencing offers significant flexibility for many applications </li...
Outline of presentation <ul><li>Delivering a unique next generation sequencing service  —  Dr Mike Evans, CEO </li></ul><u...
Adding Value Through Analysis <ul><li>Introduction </li></ul><ul><li>NGS data analysis </li></ul><ul><ul><li>Primary analy...
The Analysis Challenge Sequencer Hard drive with  ~4Gb per exome Publication
Raw Data: FASTQ (standard text representation of short reads) <ul><li>FASTQ uses four lines per sequence.  </li></ul><ul><...
Phred Quality Scores <ul><li>Phred is an accurate base-caller used for capillary traces  (Ewing et al Genome Research 1998...
<ul><li>FASTQ is FASTA with quality scores added. Standard output format of NGS basecalling; </li></ul><ul><li>SAM and BAM...
<ul><li>Just FASTQ files </li></ul><ul><li>Data mapped and assembly  (vs. genome or exome? De-duplicated?  Locally re-alig...
Adding Value Through Analysis <ul><li>Introduction </li></ul><ul><li>NGS data analysis </li></ul><ul><ul><li>Primary analy...
Primary Analysis  - Mapping and Alignment Raw Sequence Files FASTQ Format Mapping BWA/Bowtie Raw Alignment Files SAM/BAM F...
Why Mark Duplicates and Realignment around Indels?
Why Mark Duplicates and Realignment around Indels? 3 incorrect calls within 40bp!
Primary Analysis  - Mapping and Alignment Raw Sequence Files FASTQ Format Mapping BWA/Bowtie Raw Alignment Files SAM/BAM F...
NGS Variant Calling Methods <ul><li>Option 1 - Hard filtering   </li></ul><ul><li>Example: SNP can only be called if </li>...
Base Quality Score Re-Calibration Source: The Broad Institute http://www.broadinstitute.org/files/shared/mpg/nextgen2010/n...
Primary Analysis  – Raw data and assembly QC Raw Sequence Files FASTQ Format Mapping BWA/Bowtie Raw Alignment Files SAM/BA...
Primary Analysis  – Raw data and assembly QC Raw Sequence Files FASTQ Format Mapping BWA/Bowtie Raw Alignment Files SAM/BA...
Secondary Analysis  SNP and Indel calling, annotation and filtering GATK Unified Genotyper Analysis-ready alignment SNPs I...
SNP/Indel Classification (standard analysis) <ul><li>We check and annotate every single detected SNP and Indel against all...
<ul><li>SIFT  predicts whether an  amino acid substitution affects protein function  </li></ul><ul><li>based on  </li></ul...
PolyPhen: Prediction of Functional Effect of nsSNPs <ul><li>PolyPhen  (= Poly morphism  Phen otyping) is an automatic tool...
OGT Processing Overview Data Information Individual Genome Analysis (Standard Level) Multi Genome Analysis, Data Gathering...
NGS Data Delivery Hard drive (or FTP) ship data browse Double click! Copy data to shared drive or local hard drive and...
NGS Data Delivery Hard drive (or FTP) ship data browse Comprehensive HTML analysis report Copy data to shared drive or loc...
NGS Data Delivery Hard drive (or FTP) ship data browse Comprehensive HTML analysis report Copy data to shared drive or loc...
Analysis Report: Summary Section
Analysis Report: Summary Section
Analysis Report: Summary Section
Analysis Report: Summary Section
Analysis Report: QC Section – Read QC
Analysis Report: QC Section – Read QC
Analysis Report: QC Section – Read QC
Analysis Report: QC Section – Read QC
Analysis Report: QC Section – Read QC
Analysis Report: QC Section – Alignment QC
Analysis Report: QC Section – Alignment QC
Analysis Report: QC Section – Alignment QC
<ul><li>Analysis Section - Overview </li></ul>
<ul><li>Analysis Section - Overview </li></ul>
<ul><li>The Variant Table View </li></ul>Filter Interface
<ul><li>The Variant Table View </li></ul>Data display Data export
<ul><li>The Variant Table View – External Links </li></ul>
<ul><li>The Variant Table View – External Links </li></ul>
<ul><li>The Detailed Variant View </li></ul>
<ul><li>The Detailed Variant View </li></ul>
<ul><li>Predicted Consequences on Protein Function </li></ul>
<ul><li>Predicted Consequences on Protein Function </li></ul>
<ul><li>Predicted Consequences on Protein Function </li></ul>
Alignment View of Selected Variant in IGV
Alignment View of Selected Variant in IGV
Alignment View of Selected Variant in IGV
Interactive Data Filtering
Interactive Data Filtering
Case Study: a published exome study <ul><ul><li>Multi exome study reveal causative mutation of monogenic disorder </li></u...
Analysis Report: Supplementary Section
Summary OGT offers fast, accurate & powerful NGS analysis <ul><ul><li>Standard Analysis </li></ul></ul><ul><ul><li>Robust ...
Outline of Presentation <ul><li>Delivering a unique next generation sequencing service  —  Dr Mike Evans, CEO </li></ul><u...
Please Enjoy Your Lunch! <ul><li>Come and visit us at Booth #562 </li></ul><ul><li>Complete a survey for the chance to win...
Thank you www.ogt.co.uk
Upcoming SlideShare
Loading in...5
×

Eshg sequencing workshop

1,324

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,324
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
35
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Mention all business areas and how the skill sets within each business area complement each other - especially in bioinformatics (over 9 dedicated, experienced employees) helping customers to not only design the right experiments but also access meaningful results. Could mention the ER group – lends additional credibility.
  • The last 20 years has seen the rapid development of high-throughput, high-resolution molecular technologies for deciphering the genetic basis of disease. Keeping abreast of these new technologies is both expensive and time-consuming. Equipment quickly becomes obsolete as newer, more powerful platforms are developed. In addition, the sheer volume and complexity of the data generated using these technologies has placed great emphasis on the development of tools and services that can identify relevant genetic variation.
  • Launched in October 2009, Genefficiency Microarray Services has quickly established OGT as the service provider of choice for high-throughput microarray processing and data analysis. Genefficiency provides a tailor-made service taking you though every step of the process from experimental and array design though to sample processing and comprehensive data analysis. Clinical and translational researchers have utilised our service in areas such as GWAS, cancer, molecular psychiatry and gene expression analysis. We place paramount importance on our quality control steps and provide complete reassurance though over 40 QC steps on each sample. Our bespoke Laboratory Information Management System records each QC step and provides a 360 degree audit trail on reagents, consumables and equipment. This attention to detail and proven delivery of high-quality results has been recognised by a number of independent organisations…….
  • Including Agilent who named us as their first High-throughput Microarray Certified Service Provider. In addition, we are accredited by the International Organisation for Standardisation for a number awards essential for delivering high-quality genomic services
  • We have processed over 40,000 aCGH samples (50,000 if we were to include Steve Rich [since 2008]) samples in our high-throughput lab including many high profile projects including the WTCCC (20,000 samples in &lt;20 weeks).
  • General outsourced services: CRUK, ICR Custom aCGH services: UVA,, Leuven, Wessex, Sanger, UCSD, CSHL Custom aCGH products: Karolinska, Emory, Greenwood Custom Methylation Services: Greenwood
  • Our unique experience designing, processing, analysing and handling large-scale microarray projects provides us with the expertise required to deliver world-class genomic services for other technologies including….
  • OGT’s new targeted sequencing services are designed to take you from project concept right through to qualified result. The decision to sequence is just the beginning. We can provide comprehensive services, tailored to suit your project. This includes flexible and expert design upstream, advanced data analysis, full support and advice throughout plus a dedication to quality which ensures that you can have confidence in the results you get back.
  • OGT’s core expertise in the two key areas of probe design and data analysis allow us to focus on Steps 1 and 3 of the sequencing process: Step 1 is the selection of the most appropriate genomic content and design of capture probes to ensure efficiency and uniformity of capture; Step 3 is the analysis, filtering and prioritisation of variants, so that you receive information, not just data.
  • Good project design is essential and it starts with selection of the most appropriate genomic content for your study. While whole genome sequencing holds much promise, a targeted approach is still the most commonly used and offers many benefits, including lower costs, faster turnaround time, and lower data complexity. Whole exome allows you to cover the regions which are most likely to be associated with many disorders (eg Mendelian) – indeed many of the findings from whole genome studies could also have been discovered with an exome-based approach. Sequencing cost is significantly lower, and data analysis is simpler than for whole genome. For small projects, a whole exome approach can be more cost-effective than a custom-based design, because you can use “off the shelf” exome kits. Custom region design also offers significant benefits for some studies: custom design allows you to include non-coding regions, or focus on particular candidate regions, post-GWAS for example. Shorter regions are most cost-effective to sequence because you can multiplex samples. Or you can benefit from much greater depth of sequencing coverage, to increase confidence of detection. Custom design requires optimisation of capture probe design, to ensure you capture all of the region of interest as evenly as possible. This is where OGT can add significant value, with our expertise in probe design – which will be covered by Jolyon later.
  • “ The biggest bottleneck in sequencing is data analysis”.
  • Through combination of internal validation and market feedback we chose the platforms above.
  • Sanger format 0-93 ASCII encoding Illumina/Solexa 0-62 40 -60 mio such sequences per exome!
  • Sanger format 0-93 ASCII encoding Illumina/Solexa 0-62
  • Google images and you come across a wild rose and Kevin Rose, not quite what you expected. Same can happen in NGS data analysis – you don’t always get what you want
  • Recalibration of quality scores
  • Color coding?
  • Color coding?
  • Recalibration of quality scores
  • Hard filtering like microarray analysis just looking for 2x up/down regulated genes
  • Quality scores generated by sequencers are not very accurate! RMSE= root mean square error = is a measure of the differences between values estimated and the values actually observed Looking at millions of reads even little inaccuracies result in thousands of mistakes!
  • Recalibration of quality scores
  • Recalibration of quality scores
  • SVs?
  • Other components to add over time: alignment viewer, pathway information, GO annotation, genotype-based error rate
  • Both hard drive and/or FTP possible!
  • Both hard drive and/or FTP possible!
  • Both hard drive and/or FTP possible!
  • Let’s have a closer look at the report... Tabs, summary stats, variant overview,
  • Let’s have a closer look at the report... Tabs, summary stats, variant overview,
  • Let’s have a closer look at the report... Tabs, summary stats, variant overview,
  • Let’s have a closer look at the report... Tabs, summary stats, variant overview,
  • Excel or TXT download
  • Excel or TXT download
  • More links to come like OMIM; Excel or TXT download
  • More links to come like OMIM; Excel or TXT download
  • 2000 mutations to 200. Redundancy due to transcript level reporting
  • 2000 mutations to 200. Redundancy due to transcript level reporting
  • Quality and speed Filtering: you are the expert
  • Transcript of "Eshg sequencing workshop"

    1. 1. Dr. Mike Evans — Chief Executive A unique targeted sequencing service providing meaningful results, not insurmountable data
    2. 2. Outline of presentation <ul><li>Delivering a unique next generation sequencing service — Dr Mike Evans, CEO </li></ul><ul><li>Optimised bait design for targeted sequencing — Dr Jolyon Holdstock, Senior Computational Biologist </li></ul><ul><li>Adding value through analysis — Dr Volker Brenner, Head of Computational Biology </li></ul><ul><li>Summary </li></ul><ul><li>Q&A </li></ul><ul><li>Lunch </li></ul>
    3. 3. OGT - provides advanced clinical genetics solutions - develops innovative molecular diagnostics <ul><li>Founded by Ed Southern in 1995 </li></ul><ul><li>64 people </li></ul>OGT Begbroke: Corporate offices and high-throughput labs OGT Southern Centre: Biomarker discovery
    4. 4. OGT’s key businesses IP Licensing 40 licence relationships Technologies For Molecular Medicine Clinical and Genomic Solutions Cytogenetics products and genomic services Diagnostic Biomarkers Genomic- and protein-based diagnostics
    5. 5. Clinical and Genomic Solutions <ul><li>Addressing the challenges of high-throughput, high-resolution molecular technologies: </li></ul><ul><li>High equipment and staff training costs </li></ul><ul><li>Short equipment lifespan </li></ul><ul><li>Complex study design and processes (e.g. platform evaluation & selection) </li></ul><ul><li>Vast amounts of data </li></ul><ul><ul><li>Extensive computing infrastructure </li></ul></ul><ul><ul><li>Data analysis expertise and resource </li></ul></ul>The solution: Genefficiency Genomic Services
    6. 6. Genefficiency™ — World’s Leading aCGH Service <ul><li>High-quality data & complete reassurance </li></ul><ul><ul><li>Experimental and array design expertise </li></ul></ul><ul><ul><li>High-throughput processing (>2000 samples / week) </li></ul></ul><ul><ul><li>Applications: aCGH-CNV, methylation, miRNA, gene expression analysis </li></ul></ul><ul><ul><li>Comprehensive data analysis services </li></ul></ul><ul><ul><li>>40 QC checks on each sample to ensure high-quality data </li></ul></ul>
    7. 7. Independent Accreditations <ul><li>First Agilent High-Throughput Microarray Certified Service Provider </li></ul><ul><li>ISO 9001:2008 — Quality management systems </li></ul><ul><li>ISO 27001:2005 — Information security </li></ul><ul><li>ISO 17025:2005 — aCGH Laboratory services </li></ul>4593 FS 561156 IS 561157
    8. 8. Customer Satisfaction… 20,000 samples. 1,000 samples / week “ In order to characterise genetic variants, reproducible performance and reliable processing of the high resolution microarrays is essential. We were pleased with OGT’s responsive approach and attention to producing high quality data to tight deadlines ” Dr Matt Hurles, Wellcome Trust Sanger Institute .”
    9. 9. OGT Collaborators and Customers
    10. 10. A World-class Team <ul><li>Our expert team deliver: </li></ul><ul><li>Excellent project management and customer service </li></ul><ul><ul><li>>600 projects to date </li></ul></ul><ul><ul><li>>50,000 samples </li></ul></ul><ul><li>Unparalleled expertise in study and probe design </li></ul><ul><li>Advanced data analysis though a dedicated team of bioinformaticians </li></ul><ul><li>Rapid turnaround times </li></ul><ul><li>A wealth of experience of clinical and translational research projects </li></ul>
    11. 11. Delivering Discovery <ul><li>Genefficiency Targeted Sequencing Services — designed to be different: </li></ul><ul><li>Comprehensive — taking you from genomic DNA to filtered, qualified results </li></ul><ul><li>Rigorously designed — project and probe design expertise maximises your likelihood of discovery </li></ul><ul><li>Expert support — experienced team of biologists and bioinformaticians </li></ul><ul><li>Dedication to quality — from sample to result, delivering reliable results every time </li></ul>
    12. 12. Delivering an Integrated, Comprehensive Service 30/06/11 1. Selection of most appropriate genomic regions for enrichment 2. Capture, sample multiplexing and sequencing 3. Data analysis and advanced filtering of variants
    13. 13. Delivering Expert Project Design <ul><li>Step 1: Selection of most appropriate genomic regions for your project and budget </li></ul>Whole exome Pre-designed, validated whole exome capture probes Coding regions are “most likely” candidates for many disorders Custom genomic regions Expert custom design of capture probes for your regions of interest Flexibility to focus on regions of clinical significance or GWAS regions
    14. 14. Delivering Class-leading Technology <ul><li>We have fully optimised the DNA capture and sequencing methodologies, so you don’t have to! </li></ul><ul><li>Step 2: Performing the capture, sample multiplexing, library preparation and sequencing </li></ul><ul><li>Options for sample indexing and multiplexing to minimise sequencing cost </li></ul><ul><li>Depth of sequencing coverage to suit your samples and project </li></ul><ul><li>Paired-end sequencing on the industry-leading Illumina HiSeq 2000 </li></ul>
    15. 15. OGT Delivers Discovery, not just Data <ul><li>Step 3: Data analysis and advanced filtering of variants </li></ul><ul><li>OGT’s dedicated analysis pipeline brings you beyond data, to a filtered list of variants relevant to your study </li></ul>SEQUENCE FILTER DISCOVER
    16. 16. OGT Genefficiency Targeted Sequencing Services <ul><li>The PLATFORM </li></ul><ul><ul><li>Core sequencing platform: Illumina HiSeq 2000 </li></ul></ul><ul><ul><li>Core sequence capture technology: Agilent SureSelect </li></ul></ul><ul><li>The PEOPLE </li></ul><ul><ul><li>Team of highly skilled molecular biologists and bioinformaticians </li></ul></ul><ul><ul><li>Core expertise in probe design </li></ul></ul><ul><ul><li>Successful development of advanced analysis solutions </li></ul></ul>
    17. 17. Outline of presentation <ul><li>Delivering a unique next generation sequencing service — Dr Mike Evans, CEO </li></ul><ul><li>Optimised bait design for targeted sequencing — Dr Jolyon Holdstock, Senior Computational Biologist </li></ul><ul><li>Adding value through analysis — Dr Volker Brenner, Head of Computational Biology </li></ul><ul><li>Summary </li></ul><ul><li>Q&A </li></ul><ul><li>Lunch </li></ul>
    18. 18. Agenda <ul><li>Important Definitions and Terminologies </li></ul><ul><li>Introduction to Targeted Enrichment </li></ul><ul><li>Custom Bait Design </li></ul>
    19. 19. Definitions and Terminologies <ul><li>Read length – The number of bases sequenced in a fragment </li></ul><ul><li>Capture efficiency </li></ul><ul><li>Paired end sequencing </li></ul><ul><li>Read depth - How many times has a base been sequenced? </li></ul>On target Off target Off target Region of Interest Region of Interest
    20. 20. Read Depth Will Vary Across a Region of Interest *Sequence Depth >20x: ~82% for Single End How many times has a base been sequenced? *Agilent. 5990-4928EN
    21. 21. Read Depth Will Vary Across a Region of Interest *Sequence Depth >20x: ~82% for Single End ~90% for Paired End How many times has a base been sequenced? *Agilent. 5990-4928EN
    22. 22. Read Depth Required for Mutation Detection <ul><li>Assuming no allelic bias the theoretical read depth required to detect heterozygous variation with given accuracy can be calculated using a binomial distribution </li></ul><ul><ul><li>Minimum capacity required = Region of interest (ROI) x required depth </li></ul></ul><ul><ul><li>Q30 variant detection for 15Kb ROI requires 210Kb sequencing capacity </li></ul></ul><ul><li>Calculations based on variation being seen in at least 2 reads </li></ul><ul><li>Should not be just one read as this could be ‘noise’ </li></ul><ul><li>Required observations could be a percentage of reads </li></ul>Depth Required Het. Call Accuracy Probability of Error Quality 11 99% 1:100 Q20 14 99.9% 1:1000 Q30 18 99.99% 1:10000 Q40 25 99.999% 1:100000 Q50
    23. 23. Agenda <ul><li>Important Definitions and Terminologies </li></ul><ul><li>Introduction to Targeted Enrichment </li></ul><ul><li>Custom Bait Design </li></ul>
    24. 24. Why use Targeted Enrichment? <ul><li>Flexibility in choice of genomic loci </li></ul><ul><ul><li>Allows capture of specific regions of interest for SNP and Indel detection </li></ul></ul><ul><li>Cost Effectiveness </li></ul><ul><ul><li>Ideal for clinical applications </li></ul></ul><ul><ul><ul><li>Specific candidate genes are targeted </li></ul></ul></ul><ul><ul><ul><li>Fine mapping post-GWAS </li></ul></ul></ul><ul><ul><li>Cost Benefits </li></ul></ul><ul><ul><ul><li>Enables multiplexing to fill capacity </li></ul></ul></ul><ul><li>Streamlined Data Analysis </li></ul><ul><ul><li>Reduced noise due to targeted specificity </li></ul></ul>
    25. 25. Targeted Approaches Introduce Bias There are significant imbalances in the sequence coverage achieved, particularly with targeted approaches <ul><li>E.g. Agilent SureSelect* </li></ul><ul><ul><li>3.3MB ROI </li></ul></ul><ul><ul><li>10M reads </li></ul></ul><ul><ul><li>~80% Targeted bases covered at ≥ 20x depth </li></ul></ul><ul><ul><li>< 4% Targeted bases missed </li></ul></ul>*Ernani F. And LeProust E, Agilent. 5990-3532EN
    26. 26. Example of Design Bias - Insufficient Coverage Targeted gene sequencing can lead to some targets without the required depth of coverage *data kindly provided by C. Mattocks National Genetics Reference Lab, Salisbury, UK 14x (Q30) Inadequate Coverage
    27. 27. Solution: Intelligent Design to Improve Coverage: <ul><li>Option 1: </li></ul><ul><li>Increase coverage by increasing depth of sequencing </li></ul><ul><li>Coverage of all targets proportionally increased </li></ul><ul><li>Increased cost of sequencing </li></ul><ul><li>Some bases still missed </li></ul>(Q30) <ul><li>Option 2: </li></ul><ul><li>Intelligent design of capture probes increases under-represented loci </li></ul><ul><li>More even coverage of entire region, no loci missed (more likely to find mutations present) </li></ul><ul><li>No need to increase sequence depth overall (more cost effective) </li></ul>
    28. 28. Agenda <ul><li>Important Definitions and Terminologies </li></ul><ul><li>Introduction to Targeted Enrichment </li></ul><ul><li>Custom Bait Design </li></ul>
    29. 29. Problems Facing Users <ul><ul><li>Design tools not user friendly </li></ul></ul><ul><ul><li>Design tools only good for draft design </li></ul></ul><ul><li>Potential sources of bias </li></ul><ul><ul><li>Regions of interest too short </li></ul></ul><ul><ul><li>Bait thermodynamic behaviour </li></ul></ul><ul><ul><ul><li>GC content </li></ul></ul></ul><ul><ul><ul><li>Melting Temperature </li></ul></ul></ul><ul><ul><li>Risk of Design Errors </li></ul></ul><ul><li>OGT’s extensive experience in designing probes for microarrays allows us to minimise bias and ensure evenness of coverage giving the best chance to identify mutations </li></ul>
    30. 30. OGT’s Design Pipeline – what we need from you: <ul><li>Regions of Interest </li></ul><ul><ul><li>Gene lists </li></ul></ul><ul><ul><li>Chromosomal locations </li></ul></ul><ul><li>Genome build version </li></ul><ul><li>Data file format </li></ul><ul><ul><li>Text, Excel, etc.... </li></ul></ul><ul><ul><li>Consistent e.g. chr1: 2247628-2248537 </li></ul></ul>3. Singletons 2. Draft Design 1. Data 4. Thermo-dynamics 5. Report
    31. 31. <ul><li>Assess the output: </li></ul><ul><ul><li>Coverage </li></ul></ul><ul><ul><li>Bait distribution </li></ul></ul><ul><ul><li>Repeatmasking </li></ul></ul>Run Draft Design Region of Interest 3. Singleton Baits 2. Draft Design 1. Data 4. Bait Thermo-dynamics 5. Report
    32. 32. <ul><li>Assess the output: </li></ul><ul><ul><li>Coverage </li></ul></ul><ul><ul><li>Bait distribution </li></ul></ul><ul><ul><li>Repeatmasking </li></ul></ul>Run Draft Design Region of Interest Repeatmasking 3. Singleton Baits 2. Draft Design 1. Data 4. Bait Thermo-dynamics 5. Report
    33. 33. <ul><li>This ensures that small regions are captured as well as large regions </li></ul><ul><li>Advantage - Improves evenness of capture across the design </li></ul>Correction for Singleton Baits Before After <ul><li>Review the draft design and identify any regions covered by a </li></ul><ul><li>single bait </li></ul><ul><ul><li>These regions span less than 120 bases </li></ul></ul><ul><li>Add additional singleton baits to the design </li></ul>3. Singleton Baits 2. Draft Design 1. Data 4. Bait Thermo-dynamics 5. Report
    34. 34. <ul><li>GC content </li></ul><ul><li>Calculate GC content for all baits </li></ul><ul><li>Identify those baits where GC content is extreme (for instance >65% and <40%) </li></ul><ul><li>Add additional copies of these baits </li></ul>Correction for Bait Thermodynamics Region of Interest GC extreme <ul><li>T m content </li></ul><ul><li>Calculate the T m for all baits </li></ul><ul><li>Identify those baits where T m is extreme (e.g. > 75 o C) </li></ul><ul><li>Add additional copies of these baits </li></ul>T m extreme 3. Singleton Baits 2. Draft Design 1. Data 4. Bait Thermo-dynamics 5. Report
    35. 35. Customer Report <ul><li>Design Parameters </li></ul><ul><li>Depth of Coverage </li></ul><ul><ul><li>On target / Off target </li></ul></ul><ul><ul><li>Regions not covered – and why not </li></ul></ul><ul><li>Bait Details </li></ul><ul><ul><li>Singletons </li></ul></ul><ul><ul><li>GC distribution </li></ul></ul><ul><ul><li>T m distribution </li></ul></ul><ul><li>Library Design </li></ul><ul><ul><li>Baits generated </li></ul></ul>3. Singleton Baits 2. Draft Design 1. Data 4. Bait Thermo-dynamics 5. Report
    36. 36. <ul><ul><li>Better ‘evenness’ of coverage helps ensure no regions are missed and maximises the likelihood of variant detection </li></ul></ul><ul><ul><li>Improvement of overall capture efficiency and on-target performance equals cost effective sequencing downstream </li></ul></ul><ul><ul><li>Increase capture efficiency of SNPs and Indels equals an increase in the likelihood of detection </li></ul></ul><ul><ul><li>Reduction of risk </li></ul></ul>Advantages of OGT’s Approach
    37. 37. Summary <ul><li>Custom design of regions for targeted sequencing offers significant flexibility for many applications </li></ul><ul><li>Expert probe design will ensure: </li></ul><ul><ul><li>Evenness of coverage across the entire region </li></ul></ul><ul><ul><li>Maximum likelihood of discovery of variants </li></ul></ul><ul><ul><li>Efficient and cost effective use of sequencer capacity </li></ul></ul><ul><li>Overall these modifications make OGT’s capture perform better </li></ul>
    38. 38. Outline of presentation <ul><li>Delivering a unique next generation sequencing service — Dr Mike Evans, CEO </li></ul><ul><li>Optimised bait design for targeted sequencing — Dr Jolyon Holdstock, Senior Computational Biologist </li></ul><ul><li>Adding value through analysis — Dr Volker Brenner, Head of Computational Biology </li></ul><ul><li>Summary </li></ul><ul><li>Q&A </li></ul><ul><li>Lunch </li></ul>
    39. 39. Adding Value Through Analysis <ul><li>Introduction </li></ul><ul><li>NGS data analysis </li></ul><ul><ul><li>Primary analysis </li></ul></ul><ul><ul><ul><li>Mapping and assembly </li></ul></ul></ul><ul><ul><ul><li>Q score re-calibration </li></ul></ul></ul><ul><ul><ul><li>NGS sequencing QC </li></ul></ul></ul><ul><ul><ul><li>NGS alignment QC </li></ul></ul></ul><ul><ul><li>Secondary analysis </li></ul></ul><ul><ul><ul><li>SNP and Indel calling </li></ul></ul></ul><ul><ul><ul><li>Annotation and evaluation pipeline </li></ul></ul></ul><ul><ul><ul><li>SIFT and PolyPhen </li></ul></ul></ul><ul><li>Deliverables </li></ul><ul><li>Case study </li></ul><ul><li>Summary </li></ul>
    40. 40. The Analysis Challenge Sequencer Hard drive with ~4Gb per exome Publication
    41. 41. Raw Data: FASTQ (standard text representation of short reads) <ul><li>FASTQ uses four lines per sequence. </li></ul><ul><ul><li>Line 1: ' @ ' followed by a sequence identifier </li></ul></ul><ul><ul><li>Line 2: raw sequence letters </li></ul></ul><ul><ul><li>Line 3: ' + ' (and optional sequence identifier) </li></ul></ul><ul><ul><li>Line 4: quality values for the sequence in Line 2. Must contain the same number of symbols as letters in the sequence. (The letters encode Phred Quality Scores from 0 to 93 using ASCII 33 to 126) </li></ul></ul><ul><li>Example </li></ul><ul><ul><ul><li>@SEQ_ID </li></ul></ul></ul><ul><ul><ul><li>GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT </li></ul></ul></ul><ul><ul><ul><li>+ </li></ul></ul></ul><ul><ul><ul><li>!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 </li></ul></ul></ul>
    42. 42. Phred Quality Scores <ul><li>Phred is an accurate base-caller used for capillary traces (Ewing et al Genome Research 1998) </li></ul><ul><li>Each called base is given a quality score Q </li></ul><ul><li>Quality based on simple metrics (such as peak spacing) calibrated against a database of hand-edited data </li></ul><ul><li>Q Phred = -10 * log10(estimated probability call is wrong) </li></ul><ul><ul><li>Q30 often used as a threshold for useful sequence data </li></ul></ul>Phred Quality Score Probability of incorrect base call Base call accuracy 10 1 in 10 90 % 20 1 in 100 99 % 30 1 in 1000 99.9 % 40 1 in 10000 99.99 %
    43. 43. <ul><li>FASTQ is FASTA with quality scores added. Standard output format of NGS basecalling; </li></ul><ul><li>SAM and BAM are equivalent formats for describing alignments of reads to a reference genome </li></ul><ul><li>SAM: text file </li></ul><ul><li>BAM: compressed binary, indexed, so it is possible to access reads mapping to a segment without decompressing the entire file </li></ul><ul><li>BAM is used by IGV and other software </li></ul><ul><li>Current Standard Binary Format containing: </li></ul><ul><ul><li>Meta Information (read groups, algorithm details) </li></ul></ul><ul><ul><li>Sequence and Quality Scores </li></ul></ul><ul><ul><li>Alignment information </li></ul></ul><ul><li>VCF file: text file that lists all called variants (= differences to reference genome) </li></ul>File Formats: FASTQ, SAM, BAM, VCF
    44. 44. <ul><li>Just FASTQ files </li></ul><ul><li>Data mapped and assembly (vs. genome or exome? De-duplicated? Locally re-aligned? Indexed?) </li></ul><ul><li>All of the above plus VCF file </li></ul><ul><li>Annotation of variants against genes, exons, transcripts... </li></ul><ul><li>Links to external resources </li></ul><ul><li>Sequence alignments for visual inspection of variant calls </li></ul><ul><li>Filtered and prioritised data </li></ul><ul><li>Multi-genome analysis </li></ul><ul><li>*) Kevin Rose (born Robert Kevin Rose , February 21, 1977) is an American Internet entrepreneur </li></ul>NGS Data Analysis: A rose is a rose is a rose #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A_36_B100184 65 . T C 6.2 . DP=27;AF1=0.4999;CI95=0.5,0.5;DP4=7,12,5,3;MQ=44;FQ=8.65;PV4=0.4,4.2 A_36_B100224 48 . G A 225 . DP=80;AF1=0.5;CI95=0.5,0.5;DP4=32,4,38,3;MQ=56;FQ=8.65 A_36_B100255 42 . A C 22 . DP=32;AF1=0.5;CI95=0.5,0.5;DP4=23,2,4,3;MQ=20;FQ=25;PV4=0.057,1.9e-06,1,0.004 A_36_B100333 76 . G A 225 . DP=50;AF1=0.5;CI95=0.5,0.5;DP4=10,9,18,9;MQ=57;FQ=225;PV4=0.3 ...
    45. 45. Adding Value Through Analysis <ul><li>Introduction </li></ul><ul><li>NGS data analysis </li></ul><ul><ul><li>Primary analysis </li></ul></ul><ul><ul><ul><li>Mapping and assembly </li></ul></ul></ul><ul><ul><ul><li>Q score re-calibration </li></ul></ul></ul><ul><ul><ul><li>NGS sequencing QC </li></ul></ul></ul><ul><ul><ul><li>NGS alignment QC </li></ul></ul></ul><ul><ul><li>Secondary analysis </li></ul></ul><ul><ul><ul><li>SNP and Indel calling </li></ul></ul></ul><ul><ul><ul><li>Annotation and evaluation pipeline </li></ul></ul></ul><ul><ul><ul><li>SIFT and PolyPhen </li></ul></ul></ul><ul><li>Deliverables </li></ul><ul><li>Case study </li></ul><ul><li>Summary </li></ul>
    46. 46. Primary Analysis - Mapping and Alignment Raw Sequence Files FASTQ Format Mapping BWA/Bowtie Raw Alignment Files SAM/BAM Format Local Realignment (around InDels) GATK Duplicate marking Analysis-ready Alignment Picard SAM/BAM Format Quality score re-calibration Picard
    47. 47. Why Mark Duplicates and Realignment around Indels?
    48. 48. Why Mark Duplicates and Realignment around Indels? 3 incorrect calls within 40bp!
    49. 49. Primary Analysis - Mapping and Alignment Raw Sequence Files FASTQ Format Mapping BWA/Bowtie Raw Alignment Files SAM/BAM Format Local Realignment (around InDels) GATK Duplicate marking Analysis-ready Alignment Picard SAM/BAM Format Quality score re-calibration Picard
    50. 50. NGS Variant Calling Methods <ul><li>Option 1 - Hard filtering </li></ul><ul><li>Example: SNP can only be called if </li></ul><ul><ul><li>read depth >10 </li></ul></ul><ul><ul><li>>35% of reads carry SNP </li></ul></ul><ul><li>Effective filtering </li></ul><ul><li>Transparent to user </li></ul><ul><li>Simplistic approach </li></ul><ul><li>Will miss high quality calls that don’t pass threshold </li></ul><ul><li>Option 2 - Statistical analysis </li></ul><ul><li>Based on quality scores of individual basepairs, the alignment and statistical probability models </li></ul><ul><li>Robust </li></ul><ul><li>Optimum balance of sensitivity and specificity due to the use of statistical models </li></ul><ul><li>Fewer false positive and false negative SNP calls </li></ul><ul><li>Requires correctly pre-processed data with reliable quality scores </li></ul>
    51. 51. Base Quality Score Re-Calibration Source: The Broad Institute http://www.broadinstitute.org/files/shared/mpg/nextgen2010/nextgen_poplin.pdf Before Recalibration After Recalibration
    52. 52. Primary Analysis – Raw data and assembly QC Raw Sequence Files FASTQ Format Mapping BWA/Bowtie Raw Alignment Files SAM/BAM Format Local Realignment (around InDels) GATK Duplicate marking Analysis-ready Alignment Picard SAM/BAM Format Quality score re-calibration Picard
    53. 53. Primary Analysis – Raw data and assembly QC Raw Sequence Files FASTQ Format Mapping BWA/Bowtie Raw Alignment Files SAM/BAM Format Local Realignment (around InDels) GATK Duplicate marking Analysis-ready Alignment Picard SAM/BAM Format Quality score re-calibration Picard Sequence QC check Raw data QC Report FastQC AlignmentQC Report Alignment QC check Picard
    54. 54. Secondary Analysis SNP and Indel calling, annotation and filtering GATK Unified Genotyper Analysis-ready alignment SNPs InDels VCF Format Variant Evaluation Comprehensive interactive OGT Report AlignmentQC Report Sequence QC Report SAM/BAM Format OGT <ul><li>Known variant? </li></ul><ul><li>Impact on gene expression? </li></ul><ul><li>Splicing affected? </li></ul><ul><li>Non-synonymous or frameshift mutation? </li></ul><ul><li>Impact on protein function? </li></ul><ul><li>How confident are we in the call? </li></ul><ul><li>Zygosity? </li></ul>
    55. 55. SNP/Indel Classification (standard analysis) <ul><li>We check and annotate every single detected SNP and Indel against all human Ensembl genes and transcripts and dbSNP </li></ul><ul><li>dbSNP annotation: </li></ul><ul><li>Is the variant known? </li></ul><ul><li>Obtain allele frequency </li></ul><ul><li>Does it affect any of the following </li></ul><ul><li>Promoter region </li></ul><ul><li>UTR </li></ul><ul><li>Splice sites or intronic region </li></ul><ul><li>CDS </li></ul><ul><ul><li>Synonymous mutation </li></ul></ul><ul><ul><li>Non synonymous mutation </li></ul></ul><ul><ul><li>Frameshift mutation </li></ul></ul><ul><ul><li>Stop codon (truncated/elongated protein sequence) </li></ul></ul><ul><ul><li>Overlap with protein domain </li></ul></ul><ul><ul><li>Consequence on protein function predicted (SIFT & PolyPhen) </li></ul></ul>
    56. 56. <ul><li>SIFT predicts whether an amino acid substitution affects protein function </li></ul><ul><li>based on </li></ul><ul><ul><li>sequence homology (phylogenetic conservation) </li></ul></ul><ul><ul><li>the physical properties of amino acids. </li></ul></ul><ul><li>SIFT can be applied to naturally occurring non-synonymous polymorphisms and laboratory-induced mutations. </li></ul>SIFT – S ORTS I NTOLERANT F ROM T OLERANT MUTATIONS
    57. 57. PolyPhen: Prediction of Functional Effect of nsSNPs <ul><li>PolyPhen (= Poly morphism  Phen otyping) is an automatic tool for prediction of possible impact of an amino acid substitution on the structure and function of a human protein. This prediction is based on straightforward empirical rules which are applied to the sequence, phylogenetic and structural information characterizing the substitution </li></ul>
    58. 58. OGT Processing Overview Data Information Individual Genome Analysis (Standard Level) Multi Genome Analysis, Data Gathering and Comparison (Advanced Level) Tailored analysis based on client’s individual requirements (Expert Level) Perform pairwise genome analysis Filter out variants present in any “baseline” exome (e.g. somatic tissue, healthy sibling) AND not all “case” exomes Study specific additional in-depth filtering and analysis
    59. 59. NGS Data Delivery Hard drive (or FTP) ship data browse Double click! Copy data to shared drive or local hard drive and...
    60. 60. NGS Data Delivery Hard drive (or FTP) ship data browse Comprehensive HTML analysis report Copy data to shared drive or local hard drive and...
    61. 61. NGS Data Delivery Hard drive (or FTP) ship data browse Comprehensive HTML analysis report Copy data to shared drive or local hard drive and... File location & share results
    62. 62. Analysis Report: Summary Section
    63. 63. Analysis Report: Summary Section
    64. 64. Analysis Report: Summary Section
    65. 65. Analysis Report: Summary Section
    66. 66. Analysis Report: QC Section – Read QC
    67. 67. Analysis Report: QC Section – Read QC
    68. 68. Analysis Report: QC Section – Read QC
    69. 69. Analysis Report: QC Section – Read QC
    70. 70. Analysis Report: QC Section – Read QC
    71. 71. Analysis Report: QC Section – Alignment QC
    72. 72. Analysis Report: QC Section – Alignment QC
    73. 73. Analysis Report: QC Section – Alignment QC
    74. 74. <ul><li>Analysis Section - Overview </li></ul>
    75. 75. <ul><li>Analysis Section - Overview </li></ul>
    76. 76. <ul><li>The Variant Table View </li></ul>Filter Interface
    77. 77. <ul><li>The Variant Table View </li></ul>Data display Data export
    78. 78. <ul><li>The Variant Table View – External Links </li></ul>
    79. 79. <ul><li>The Variant Table View – External Links </li></ul>
    80. 80. <ul><li>The Detailed Variant View </li></ul>
    81. 81. <ul><li>The Detailed Variant View </li></ul>
    82. 82. <ul><li>Predicted Consequences on Protein Function </li></ul>
    83. 83. <ul><li>Predicted Consequences on Protein Function </li></ul>
    84. 84. <ul><li>Predicted Consequences on Protein Function </li></ul>
    85. 85. Alignment View of Selected Variant in IGV
    86. 86. Alignment View of Selected Variant in IGV
    87. 87. Alignment View of Selected Variant in IGV
    88. 88. Interactive Data Filtering
    89. 89. Interactive Data Filtering
    90. 90. Case Study: a published exome study <ul><ul><li>Multi exome study reveal causative mutation of monogenic disorder </li></ul></ul>Standard Analysis Advanced Analysis
    91. 91. Analysis Report: Supplementary Section
    92. 92. Summary OGT offers fast, accurate & powerful NGS analysis <ul><ul><li>Standard Analysis </li></ul></ul><ul><ul><li>Robust statistical data analysis </li></ul></ul><ul><ul><li>Comprehensive variant annotation </li></ul></ul><ul><ul><li>Interactive filtering and prioritisation of data based on </li></ul></ul><ul><ul><ul><li>chromosomal region </li></ul></ul></ul><ul><ul><ul><li>allele frequency / novelty </li></ul></ul></ul><ul><ul><ul><li>zygosity </li></ul></ul></ul><ul><ul><ul><li>confidence score </li></ul></ul></ul><ul><ul><ul><li>severity of mutation </li></ul></ul></ul><ul><ul><li>Advanced Analysis </li></ul></ul><ul><ul><li>Multi-genome comparison </li></ul></ul><ul><ul><li>Bespoke analysis </li></ul></ul><ul><ul><li>Tailored to your specific requirements </li></ul></ul>let us help you with your workload
    93. 93. Outline of Presentation <ul><li>Delivering a unique next generation sequencing service — Dr Mike Evans, CEO </li></ul><ul><li>Optimised bait design for targeted sequencing — Dr Jolyon Holdstock, Senior Computational Biologist </li></ul><ul><li>Adding value through analysis — Dr Volker Brenner, Head of Computational Biology </li></ul><ul><li>Summary </li></ul><ul><li>Q&A </li></ul><ul><li>Lunch </li></ul>
    94. 94. Please Enjoy Your Lunch! <ul><li>Come and visit us at Booth #562 </li></ul><ul><li>Complete a survey for the chance to win a Kindle* eBook Reader </li></ul><ul><li>Come to our wine reception tomorrow (Sunday) at 17:00 at our booth </li></ul>*For full Terms and Conditions please visit www.ogt.co.uk/genefficiency/ESHGsurvey.html
    95. 95. Thank you www.ogt.co.uk
    1. ¿Le ha llamado la atención una diapositiva en particular?

      Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

    ×