0
Eagle Genomics Symposium               "Provisioning bioinformatics - are we prepared?"                              Ulrik...
Agenda   I.          Introduction to GATC Biotech, providing sequencing service   II.         Presentation of in-house seq...
GATC Biotech - where we areGATC Biotech              confidential   VII © 2007-2011
GATC Biotech  • leading european commercial sequencing service provider  • over 20 years of experience and know how  • ISO...
Sequencing technologies in house                  Applied Biosystems                      Roche / 454                     ...
Sequencing capacity                                    70                                                                 ...
System comparison                                  HiSeq                PacBio  system       GS FLX             2000      ...
Comparison                   GS FLX            HiSeq 2000                PacBio RS                                     50 ...
Bioinformatics   Definition:     • Science explaining biology by using information       technologies (computational biolo...
Bioinformatics    Main object:     • Presentation of macromolecules as linear chains of       defined components or as seq...
Bioinformatics    Main object:     • Presentation of macromolecules as linear chains of       defined components or as seq...
History     • Manual analyses of sequential homologies using standard       word processing programmes     • Sustainable c...
Evolution of Sequencing     • Sanger sequencing (1 read per sequencing run)     • Roche (1,000,000 reads per sequencing ru...
Evolution of Sequencing                                    70                                    60 yearly sequencing capa...
Sequence analysis - today    • Massively produced sequence data using next generation      sequencing technologies    • Ad...
Sequence analysis - today    • Massively produced sequence data using next generation      sequencing technologies    • Ad...
Sequence analysis - today    • Massively produced sequence data using next generation      sequencing technologies        ...
Sequence analysis - today     • Massively produced sequence data using next generation sequencing       technologies     •...
Sequence analysis - today    • Massively produced sequence data using next generation      sequencing technologies        ...
Sequence analysis - today       Query           QLength     %HitLength        HitLength        %Identity        e-value   ...
Sequencing applications   DNA   • single reads in tubes & plates (PCR, plasmids)   • whole (meta)genome de novo sequencing...
Sequence analysis - today    • Massively produced sequence data using next generation      sequencing technologies    • Ad...
Conclusion               Provisioning bioinformatics: Are we prepared?                                        GAP   Advanc...
Thanks for your kind attention.                    Open questions?                  www.gatc-biotech.comGATC Biotech      ...
Upcoming SlideShare
Loading in...5
×

6. Ulrike Schoeck- GATC Biotech

813

Published on

Provisioning Bioinformatics are we prepared.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
813
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "6. Ulrike Schoeck- GATC Biotech"

  1. 1. Eagle Genomics Symposium "Provisioning bioinformatics - are we prepared?" Ulrike Schoeck GATC Biotech April, 5th 2011GATC Biotech confidential VII © 2007-2011
  2. 2. Agenda I. Introduction to GATC Biotech, providing sequencing service II. Presentation of in-house sequencing technologies III. Bioinformatics - definition and history IV. Evolution of sequencing V. Sequence analysis - what do we have to face everyday? VI. Sequencing applications - what is possible? VII. Conclusions - are we prepared?GATC Biotech confidential VII © 2007-2011
  3. 3. GATC Biotech - where we areGATC Biotech confidential VII © 2007-2011
  4. 4. GATC Biotech • leading european commercial sequencing service provider • over 20 years of experience and know how • ISO-certified since 1997 • 100% privately owned, self-financed & independent • more than 125 employees in 5 subsidaries, 22 sales offices • 3-shift sequencing labs in Germany (Konstanz & Duesseldorf) and UK • over 10,000 customers all over the world (industry & academia) • Illumina Certified Service Provider Complete and integrated sequencing & bioinformatic solutions: from single sample to ultra high throughputGATC Biotech confidential VII © 2007-2011
  5. 5. Sequencing technologies in house Applied Biosystems Roche / 454 ABI 3730xl Genome Sequencer FLX since 1996 since 2006 Pacific Bosciences PacBio RS Illumina / Solexa HiSeq 2000 since 2006 May 2011GATC Biotech confidential VII © 2007-2011
  6. 6. Sequencing capacity 70 Applied Biosystems ABI 3730xl 60 yearly sequencing capacity in Tb 50 40 GA GA GA 30 PacBio RS HiSeq 20 10 GS FLX GS FLX 0 till 2006 July 07 Jan 08 July 08 Jan 09 July 09 Jan 10 July 10 Jan 11 July 11GATC Biotech confidential VII © 2007-2011
  7. 7. System comparison HiSeq PacBio system GS FLX 2000 RS available 2005 (GS 20 by 2006 (Genetic 2010 since 454 Life Science) Analyzer by Solexa) PicoTiterPlate flowcells SMRT cells w /zero- device w/ wells w/ channels mode waveguides library DNA fragmentation, adapter ligation amplification emulsion PCR bridging PCR none sequencing by sequencing by sequencing by synthesis synthesis synthesis sequencing pyrosequencing cyclic reversible single molecule, termination real-timeGATC Biotech confidential VII © 2007-2011
  8. 8. Comparison GS FLX HiSeq 2000 PacBio RS 50 bases Read length Ø 400 bases > 1,000 bases 100 bases averaging 140- 2 x 50 or 2 x 100 bases Mate pairs / 200+ bases insert sizes 300 b, strobe reads paired end insert sizes ~ 3 kb ~ 3 kb & higher > 800,000,000 # of reads / (single reads) 75,000 ZMVs / > 1 mio run > 1,600,000,000 SMRT cell (paired end) same bases in base base after base, base after base, one cycle integration cycle per cycle continuously (homopolymers)GATC Biotech confidential VII © 2007-2011
  9. 9. Bioinformatics Definition: • Science explaining biology by using information technologies (computational biology) • Providing algorithms, databases, user interfaces and statistical applications for specifying potential scientific significanceGATC Biotech confidential VII © 2007-2011
  10. 10. Bioinformatics Main object: • Presentation of macromolecules as linear chains of defined components or as sequences of symbols • Main application in bioinformatics: comparison of sequences for detecting homology (function, structure) GCGTCCTCGGGCTTGGCGA ACTGGGCGGCGGCGGTGGC GGGCAGCAGCATGGGGGCG GCA...GATC Biotech confidential VII © 2007-2011
  11. 11. Bioinformatics Main object: • Presentation of macromolecules as linear chains of defined components or as sequences of symbols • Main application in bioinformatics: comparison of sequences for detecting homology (function, structure) GCGTCCTCGGGCTTGGCGA ACTGGGCGGCGGCGGTGGC GGGCAGCAGCATGGGGGCG GCA...GATC Biotech confidential VII © 2007-2011
  12. 12. History • Manual analyses of sequential homologies using standard word processing programmes • Sustainable change in molecular biology by introducing efficient computer algorithms • Sequence alignment • Phylogenetics • Pattern matching • Web-based database searches •…GATC Biotech confidential VII © 2007-2011
  13. 13. Evolution of Sequencing • Sanger sequencing (1 read per sequencing run) • Roche (1,000,000 reads per sequencing run) • Illumina (1,600,000,000 reads per sequencing run)GATC Biotech confidential VII © 2007-2011
  14. 14. Evolution of Sequencing 70 60 yearly sequencing capacity in Tb 50 40 30 20 10 0 till 2006 July 07 Jan 08 July 08 Jan 09 July 09 Jan 10 July 10 Jan 11 July 11GATC Biotech confidential VII © 2007-2011
  15. 15. Sequence analysis - today • Massively produced sequence data using next generation sequencing technologies • Advantages • Applications, applications, applications… • Runtime • Costs • Challenges • Data analysis and interpretation • Hardware infrastructure • Data storage • Software development • Error ratesGATC Biotech confidential VII © 2007-2011
  16. 16. Sequence analysis - today • Massively produced sequence data using next generation sequencing technologies • Advantages • Applications, applications, applications… • Runtime • Costs Example: de novo sequencing @HWI-ST143_0345:7:1:1200:2150#CGATGT/1 TTCTTCTGATGCCGGCATCCCTGCTTGCAGGTGTGAAG + HHHHHHHHHHHHHHHHFHHHHHHHHHHHHHF=FBF:CF @HWI-ST143_0345:7:1:1310:2072#CGATGT/1 CGTTTCTAAAGCACCCACTATGGATGNNCAGCAGGACA + GFFGEFEFFFGDGGCEBEE=EEEEE9##55++;(1@A: ...GATC Biotech confidential VII © 2007-2011
  17. 17. Sequence analysis - today • Massively produced sequence data using next generation sequencing technologies @HWI-ST143_0345:7:1:1200:2150#CGATGT/1 Assembly TTCTTCTGATGCCGGCATCCCTGCTTGCAGGTGTGAAG + Annotation • Advantages HHHHHHHHHHHHHHHHFHHHHHHHHHHHHHF=FBF:CF @HWI-ST143_0345:7:1:1310:2072#CGATGT/1 Scaffolding CGTTTCTAAAGCACCCACTATGGATGNNCAGCAGGACA + Finishing • Applications, applications, applications… GFFGEFEFFFGDGGCEBEE=EEEEE9##55++;(1@A: ... Bioinformatics • Runtime • CostsGATC Biotech confidential VII © 2007-2011
  18. 18. Sequence analysis - today • Massively produced sequence data using next generation sequencing technologies • Advantages • Applications, applications, applications… • Runtime • Costs Example: Quantitative transcriptomics @HWI-ST143_0345:7:1:1200:2150#CGATGT/1 TTCTTCTGATGCCGGCATCCCTGCTTGCAGGTGTGAAG + HHHHHHHHHHHHHHHHFHHHHHHHHHHHHHF=FBF:CF @HWI-ST143_0345:7:1:1310:2072#CGATGT/1 CGTTTCTAAAGCACCCACTATGGATGNNCAGCAGGACA + GFFGEFEFFFGDGGCEBEE=EEEEE9##55++;(1@A: ...GATC Biotech confidential VII © 2007-2011
  19. 19. Sequence analysis - today • Massively produced sequence data using next generation sequencing technologies @HWI-ST143_0345:7:1:1200:2150#CGATGT/1 TTCTTCTGATGCCGGCATCCCTGCTTGCAGGTGTGAAG + Alignment Comparison • Advantages HHHHHHHHHHHHHHHHFHHHHHHHHHHHHHF=FBF:CF @HWI-ST143_0345:7:1:1310:2072#CGATGT/1 CGTTTCTAAAGCACCCACTATGGATGNNCAGCAGGACA + Clustering • Applications, applications, applications… GFFGEFEFFFGDGGCEBEE=EEEEE9##55++;(1@A: ... Quantification Bioinformatics • Runtime • Costs data Preanalysis:  short quality reads  low complexity regions  cDNA adapters  sequencing primers cleaned sequences option 1: option 2: clustering assembly Assembly cluster representatives de novo contigs BLAST BLAST analysis Assembly validation analysis cluster hits contig hitsGATC Biotech confidential VII © 2007-2011
  20. 20. Sequence analysis - today Query QLength %HitLength HitLength %Identity e-value GeneID GeneLength GD3X8YD02G3UTK 473 105.07 497 88 1.00E-138 59783566 778 GD3X8YD02HAS2D 504 103.57 522 92 0 112201467 867 GD3X8YD01EQVR9 438 103.42 453 89 1.00E-129 82985781 904 GD3X8YD02F9LP1 372 103.23 384 92 1.00E-140 194673237 7376 GD3X8YD01BBAL3 435 103.22 449 91 1.00E-165 112362035 3276 Cluster ID Length(bp) %HitLength e-value UniGene ID Gene Length Contig ID Length (bp) Hit Start Hit End GD3X8YD02IRTW2 413 103.15 426 87 1.00E-104 56145323 770 GD3X8YD02G3UTK 473 105.07 1.00E-138 59783566 778 contig20575 2718 300 770 GD3X8YD01C9534 418 103.11 431 93 1.00E-167 157279321 3170 contig01324 2816 1230 1804 GD3X8YD02HAS2D 504 103.57 0 112201467 867 GD3X8YD01EQVR9 461 GD3X8YD01EJJ53 438 103.04 103.42 475 1.00E-129 89 82985781 1.00E-137 904 73976208 contig01325 2891 2825 67 513 GD3X8YD02F9LP1 372 103.23 1.00E-140 194673237 7376 contig01323 data 2903 Preanalysis: 2005 2375 GD3X8YD01BBAL3 435 103.22 1.00E-165 112362035 3276 contig01321 2980  short quality reads 830 400 GD3X8YD02IRTW2 413 103.15 1.00E-104 56145323 770 contig01320 2977  low complexity regions 2300 2710  cDNA adapters GD3X8YD01C9534 418 103.11 1.00E-167 157279321 3170 contig01318 2894  sequencing primers 460 34 cleaned contig01315 2971 56 510 GD3X8YD01EJJ53 461 103.04 1.00E-137 73976208 2891 sequences GD3X8YD02F06H3 427 103.04 0 194673243 option 1:2026 contig01314 2968 124 552 option 2: GD3X8YD02JQKXX 463 103.02 1.00E-180 146186547 clustering 4283 contig01322 2808 2287 2750 assembly GD3X8YD01C2RA0 438 102.97 0 167693932 567 contig01319 2886 1456 1890 Assembly GD3X8YD01BMILD 439 102.96 1.00E-130 112362035 3276 cluster representatives contig01317 2960 1098 de novo contigs 1530 GD3X8YD02IA76G 478 102.93 1.00E-155 160333384 4068 contig01316 2963 1004 1484 GD3X8YD02IJUER 443 102.93 0 114451341 864 contig05489 2657 95 1438 BLAST BLAST GD3X8YD01AUA6W 484 102.89 1.00E-119 59782054 analysis 869 Assembly validation 3358 contig22645 3009 analysis 3486 GD3X8YD01CKSI4 451 102.88 1.00E-140 74268339 2212 contig20113 2734 1765 2109 GD3X8YD02JPD23 492 102.85 1.00E-167 151556820 3665 contig11912 2558 53 547 cluster hits contig01371 2299 contig hits 1453 1843 GD3X8YD02GG9F3 390 102.82 1.00E-105 219283151 3782GATC Biotech confidential VII © 2007-2011
  21. 21. Sequencing applications DNA • single reads in tubes & plates (PCR, plasmids) • whole (meta)genome de novo sequencing • whole genome re-sequencing • targeted re-sequencing (enrichment, amplicons, exons) • methylome / epigenome studies • ChIP-Seq RNA • eukaryotic / prokaryotic cDNA de novo sequencing • eukaryotic / prokaryotic cDNA re-sequencing (3’ UTR / 5’ UTR) • smallRNA / microRNAGATC Biotech confidential VII © 2007-2011
  22. 22. Sequence analysis - today • Massively produced sequence data using next generation sequencing technologies • Advantages • Applications, applications, applications… • Turnover • Costs • Challenges • Data analysis and interpretation • Hardware infrastructure • Data storage • Software development • Error ratesGATC Biotech confidential VII © 2007-2011
  23. 23. Conclusion Provisioning bioinformatics: Are we prepared? GAP Advancements in sequencing technologies (data quantity and application complexity) and Advancements in information technologies (hardware and software) SOLUTION Cloud computing, GPU usage, software developement, parallelization...GATC Biotech confidential VII © 2007-2011
  24. 24. Thanks for your kind attention. Open questions? www.gatc-biotech.comGATC Biotech confidential VII © 2007-2011
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×