SlideShare a Scribd company logo
1 of 29
Evaluating and improving the chick
     genome & transcriptome


                 C. Titus Brown
        Asst Prof, CSE and Microbiology;
               BEACON NSF STC
            Michigan State University
                  ctb@msu.edu
Acknowledgements
This is joint work with Hans Cheng (USDA ADOL), Jerry
  Dodgson (MSU).

Likit Preeyanon (MSU) and Alexis Black Pyrkosz (ADOL)
  did the work.

All of the software discussed in this talk is available.


    This work was primarily supported by the USDA NIFA
                    through a grant to me.
Simulations show that incomplete gene reference
=> inaccurate differential expression from mRNAseq
                                                                                Single End Reads                                                                                            Paired End Reads
  % Transcripts Expressed Inaccurately (2-fold Difference)




                                                                                                              % Transcripts Expressed Inaccurately (2-fold Difference)
                                                             100%                                                                                                        100%
                                                                      10                                                                                                           10
                                                                        0%                                                                                                           0%
                                                             90%                                                                                                         90%
                                                                             ex                                                                                                           ex
                                                                                pr                                                                                                          pr
                                                             80%                   e   ss                                                                                80%                     es
                                                                                          io                                                                                                        sio
                                                                      75                    n                                                                                     75                    n
                                                             70%        %                                                                                                70%        %
                                                                             ex                                                                                                         ex
                                                                                pre                                                                                                        pre
                                                                                    ss                                                                                                        s
                                                             60%                       ion                                                                               60%                       sio
                                                                                                                                                                                                      n
                                                                      50%                                                                                                         50%
                                                             50%             expr                                                                                        50%            ex p
                                                                                 essio                                                                                                         ress
                                                                                                n                                                                                                     ion
                                                             40%                                                                                                         40%

                                                             30%      25% expressi                                                                                       30%      25% expre
                                                                                                on                                                                                                 ssion
                                                             20%                                                                                                         20%

                                                             10%                                                                                                         10%

                                                             0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%                                                              0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
                                                                      % Reference Completeness                                                                                      % Reference Completeness




                                                                                                                                                                                                            Alexis Black Pyrkosz
Existing chick gene models lack exons,
isoforms



                                                      Our data




                                                         Models



 *This gene contains at least 4 isoforms.
                                            Likit Preeyanon
(Exon detection is pretty good.)




                            Likit Preeyanon
Different approaches to gene set prediction
yield distinct splice junction predictions




     > 95% of thee assembly-based splice junctions are
        supported by 4 or more independent reads.
                                                     Likit Preeyanon
mRNAseq analysis with a combined de novo
and genome-based approach.




                                 Likit Preeyanon
We can produce combined gene models.




       Cufflinks (ref based) + de novo assembly + known mRNA
Gene Model Summary
     (note: spleen mRNAseq)
         Method                           Gene                         Transcript
Global Assembly                           14,832                            32,311
Local Assembly                            15,297                            23,028
Global + Local Assembly                   15,934                            46,797




 *Number of genes and transcripts might be overdue to incomplete assembly
 and spurious splice junctions.
Cross-validation with technical replicates
     Dataset                   Single-end                        Paired-end
                        Mapped          Unmapped          Mapped         Unmapped
Line 6 uninfected         18,375,966        5,203,586       21,598,218     12,065,659
                            (77.93%)         (22.07%)         (64.16%)       (35.84%)
Line 6 infected           17,160,695        6,288,286       15,274,638         8633855
                            (73.18%)         (26.82%)         (63.89%)        (36.11%)
Line 7 uninfected         18,130,072        5,795,737       20,961,033     11,960,299
                            (75.77%)         (24.22%)         (63.67%)       (36.33%)
Line 7 infected           19,912,046        5,450,521       22,485,833     11,992,002
                            (78.51%)         (21.49%)         (65.22%)       (34.78%)


            Single-ended reads were used to generate gene models; paired-end data
                       was used as technical replicate cross-validation.
Gene Modeler Pipeline (“gimme”)
 Merge transcripts together based on transcript mapping to genome; can
  include existing gene predictions, & iteratively combine
  predictions.
 Construct gene models
 Remove redundant sequences
 Predict strands and ORFs




                                                           Likit Preeyanon
Next problem: chick reference!
 We like using the reference genome to scaffold RNAseq contigs;
  purely de novo RNAseq assembly is messy.
 Genomes are also useful for other things, we hear.
Problems:
 Poor sensitivity: the chick genome is missing a substantial number
  of genes from microchromosomes:
  723 genes from HSA19q missing from chicken galGal4.
  ESTs and RNAseq transcripts for many or most.
 Gaps
  9900 gaps on ordered chromosomes
  21k gaps on chr-aligned but low-confidence/unaligned
 Over-collapsed tandem dups and under-collapsed het
Sensitivity – where is the problem?
Are microchromosomes hard to sequence or is
  microchromosomal sequence hard to assemble?

Sequences that simply don’t show up in the data are hard to
  include in the assembly…
  Unclonable (Sanger)
  Strong GC or AT bias


Sequences with biased (generally low) coverage are often
  discarded by assemblers.
Can we “even out” coverage?
(Digital normalization)


                         If you have two loci, or two
                         mRNA species, with uneven
                        coverage, can you remove the
                                extra coverage?
Coverage before digital normalization:


                                  (MD amplified)
Coverage after digital normalization:

                            Normalizes coverage

                            Discards redundancy

                            Eliminates majority of
                            errors

                            Scales assembly dramatically.

                            Assembly is 98% identical.
Prelim results from digital
normalization
Reassembled chick genome contigs from 70x Illumina ->
 normalized reads in ~24 hours.
Obtained 40 Mbp of assembled contigs that were not present
 in galGal4.
Contig assembly contained partial or complete matches to
 70% of previously unmappable transcripts assembled from
 chick spleen mRNAseq.

⇒Bioinformatics remedies may help but are probably not
  sufficient.

                                                  Likit Preeyanon
Can we improve the assembly?
  Read cleaning and improvement

        1. Digital normalization evens out relative
        coverage, permitting recovery of difficult-
        to-sequence regions in assemblies.
        2. Error correction and read-to-graph                                Selection of
        concordance editing collapses                                       strategies and
        heterozygous regions.                                                parameters
        3. Paired-end de Bruijn graphs can be
        used to include long-distance constraints
        in primary contig assembly.
        4. RNAseq data indicates contigs that can
        be combined into scaffolds.


                                                      Assembly assessment

                                                           1. High-abundance k-mers present in the
                                                           sequence data but missing from the
                                                           assembly indicate poor sensitivity.
                                                           2. Discordant long-insert mate pairs
                Contig assembly
                                                           indicate potentially erroneous contigs and
                     and/or
                                                           scaffolds.
                  scaffolding
                                                           3. De novo RNAseq assembly can identify
                                                           likely misassemblies and positively
                                                           identify missing genomic sequence.
slides from http://slideshare.net/flxlex/ ; Lex Nederbragt


Longer reads!
  Repeat copy 1                                   Repeat copy 2




       Long reads can span repeats      and heterozygous regions




                          Polymorphic contig 22
                           Polymorphic contig

   Contig 1                                                  Contig 4
                          Polymorphic contig 33
                           Polymorphic contig
slides from http://slideshare.net/flxlex/ ; Lex Nederbragt


PacBio: first results (cod/salmon)
                       Raw reads
Cod: PacBio results
         Mapping to the published genome
                  11.4 kbp subread




                    10.6 kbp subread




                   10.9 kbp subread




          slides from http://slideshare.net/flxlex/ ; Lex Nederbragt
Need to combine Illumina + PacBio still.
                                           P_errorCorrection pipeline from

                                                                93% of reads recovered
                        2.7x
                                                  Alignments of at least 1kb to cod published assembly


           +




                                                                                             Error-corrected reads
                        23x


                                                                                         s
           +                                                                 w
                                                                                 rea
                                                                                     d
                                                                        Ra
                        24 cpus
                        4.5 days
                        100 Gb RAM


slides from http://slideshare.net/flxlex/ ; Lex
Concluding thoughts/comments
Gene models and reference genome both need work.


This is going to be a continuing process…


Together with Wes Warren (WUSTL), Hans Cheng (USDA
  ADOL), Jerry Dodgson (MSU) proposing to apply PacBio
  sequencing and digital normalization to improve chick
  genome and regularly integrate community improvements;
  should be generalizable approach.

   Questions? Contact me at: ctb@msu.edu

More Related Content

Viewers also liked

Motoholics Sponsorship Proposal 2010
Motoholics Sponsorship Proposal 2010Motoholics Sponsorship Proposal 2010
Motoholics Sponsorship Proposal 2010Gaurab Dutta
 
Fantastic Photography
Fantastic  PhotographyFantastic  Photography
Fantastic Photographylewisj2111
 
Museo Virtual De La Escuelaeste
Museo Virtual De La EscuelaesteMuseo Virtual De La Escuelaeste
Museo Virtual De La Escuelaesteguest09551a
 
Trainings Evaluation Report WPS Phase-I Lodharn
Trainings Evaluation Report WPS Phase-I LodharnTrainings Evaluation Report WPS Phase-I Lodharn
Trainings Evaluation Report WPS Phase-I LodharnZafar Ahmad
 
Arbitrator Subpoenas: Are They Worth The Paper They Are Printed On?
Arbitrator Subpoenas: Are They Worth The Paper They Are Printed On?Arbitrator Subpoenas: Are They Worth The Paper They Are Printed On?
Arbitrator Subpoenas: Are They Worth The Paper They Are Printed On?Kegler Brown Hill + Ritter
 
GAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCES
GAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCESGAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCES
GAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCESAlexander Lavrov
 
Shepley ross introduction_ode_4th
Shepley ross introduction_ode_4thShepley ross introduction_ode_4th
Shepley ross introduction_ode_4thgabo GAG
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streamingc.titus.brown
 
U Florida / Gainesville talk, apr 13 2011
U Florida / Gainesville  talk, apr 13 2011U Florida / Gainesville  talk, apr 13 2011
U Florida / Gainesville talk, apr 13 2011c.titus.brown
 
Recount of trip to Howick Historical Village
Recount of trip to Howick Historical VillageRecount of trip to Howick Historical Village
Recount of trip to Howick Historical VillageTakahe One
 
Bildspel, irish glen of imaal terrier 2005
Bildspel, irish glen of imaal terrier 2005Bildspel, irish glen of imaal terrier 2005
Bildspel, irish glen of imaal terrier 2005Åse Lundblad
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...
Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...
Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...Circles of San Antonio Community Coalition
 

Viewers also liked (20)

Motoholics Sponsorship Proposal 2010
Motoholics Sponsorship Proposal 2010Motoholics Sponsorship Proposal 2010
Motoholics Sponsorship Proposal 2010
 
Fantastic Photography
Fantastic  PhotographyFantastic  Photography
Fantastic Photography
 
Alcohol # 1 concern march 16 2016
Alcohol # 1 concern march 16 2016Alcohol # 1 concern march 16 2016
Alcohol # 1 concern march 16 2016
 
Museo Virtual De La Escuelaeste
Museo Virtual De La EscuelaesteMuseo Virtual De La Escuelaeste
Museo Virtual De La Escuelaeste
 
Trainings Evaluation Report WPS Phase-I Lodharn
Trainings Evaluation Report WPS Phase-I LodharnTrainings Evaluation Report WPS Phase-I Lodharn
Trainings Evaluation Report WPS Phase-I Lodharn
 
Cope Manifesto
Cope ManifestoCope Manifesto
Cope Manifesto
 
Nursing Skills
Nursing SkillsNursing Skills
Nursing Skills
 
About BMC
About BMCAbout BMC
About BMC
 
Arbitrator Subpoenas: Are They Worth The Paper They Are Printed On?
Arbitrator Subpoenas: Are They Worth The Paper They Are Printed On?Arbitrator Subpoenas: Are They Worth The Paper They Are Printed On?
Arbitrator Subpoenas: Are They Worth The Paper They Are Printed On?
 
GAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCES
GAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCESGAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCES
GAME TECHNOLOGIES USAGE FOR USERS ATTRACTION TO EDUCATIONAL RESOURCES
 
Shepley ross introduction_ode_4th
Shepley ross introduction_ode_4thShepley ross introduction_ode_4th
Shepley ross introduction_ode_4th
 
2012 stamps-mbl-2
2012 stamps-mbl-22012 stamps-mbl-2
2012 stamps-mbl-2
 
2014 anu-canberra-streaming
2014 anu-canberra-streaming2014 anu-canberra-streaming
2014 anu-canberra-streaming
 
U Florida / Gainesville talk, apr 13 2011
U Florida / Gainesville  talk, apr 13 2011U Florida / Gainesville  talk, apr 13 2011
U Florida / Gainesville talk, apr 13 2011
 
Recount of trip to Howick Historical Village
Recount of trip to Howick Historical VillageRecount of trip to Howick Historical Village
Recount of trip to Howick Historical Village
 
Bildspel, irish glen of imaal terrier 2005
Bildspel, irish glen of imaal terrier 2005Bildspel, irish glen of imaal terrier 2005
Bildspel, irish glen of imaal terrier 2005
 
Br10 sommerhus
Br10 sommerhusBr10 sommerhus
Br10 sommerhus
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
Ten Common Wage & Hour Blunders
Ten Common Wage & Hour BlundersTen Common Wage & Hour Blunders
Ten Common Wage & Hour Blunders
 
Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...
Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...
Circles of San Antonio Community Coalition and Bexar County DWI Task Force Ho...
 

Similar to 2013 pag-poultry-workshop

Similar to 2013 pag-poultry-workshop (11)

Reporting dashboard template
Reporting dashboard templateReporting dashboard template
Reporting dashboard template
 
Alex. papers gm svava bjarnasson
Alex. papers gm svava bjarnassonAlex. papers gm svava bjarnasson
Alex. papers gm svava bjarnasson
 
What is e market services 2010
What is e market services  2010What is e market services  2010
What is e market services 2010
 
Sfm Washington 20081120
Sfm Washington 20081120Sfm Washington 20081120
Sfm Washington 20081120
 
Asce
AsceAsce
Asce
 
Session #2: Test Driven Development
Session #2: Test Driven DevelopmentSession #2: Test Driven Development
Session #2: Test Driven Development
 
Scalable Drupal infrastructure
Scalable Drupal infrastructureScalable Drupal infrastructure
Scalable Drupal infrastructure
 
Scalable Drupal Infrastructure
Scalable Drupal InfrastructureScalable Drupal Infrastructure
Scalable Drupal Infrastructure
 
Commissioning support for London
Commissioning support for LondonCommissioning support for London
Commissioning support for London
 
The BioMed Central customer experience
The BioMed Central customer experienceThe BioMed Central customer experience
The BioMed Central customer experience
 
Designing for Disruption
Designing for DisruptionDesigning for Disruption
Designing for Disruption
 

More from c.titus.brown

More from c.titus.brown (20)

2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial2015 beacon-metagenome-tutorial
2015 beacon-metagenome-tutorial
 
2015 aem-grs-keynote
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynote
 
2015 msu-code-review
2015 msu-code-review2015 msu-code-review
2015 msu-code-review
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
2015 pycon-talk
2015 pycon-talk2015 pycon-talk
2015 pycon-talk
 
2015 opencon-webcast
2015 opencon-webcast2015 opencon-webcast
2015 opencon-webcast
 
2015 vancouver-vanbug
2015 vancouver-vanbug2015 vancouver-vanbug
2015 vancouver-vanbug
 
2015 osu-metagenome
2015 osu-metagenome2015 osu-metagenome
2015 osu-metagenome
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
2015 pag-chicken
2015 pag-chicken2015 pag-chicken
2015 pag-chicken
 
2015 pag-metagenome
2015 pag-metagenome2015 pag-metagenome
2015 pag-metagenome
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 

2013 pag-poultry-workshop

  • 1. Evaluating and improving the chick genome & transcriptome C. Titus Brown Asst Prof, CSE and Microbiology; BEACON NSF STC Michigan State University ctb@msu.edu
  • 2. Acknowledgements This is joint work with Hans Cheng (USDA ADOL), Jerry Dodgson (MSU). Likit Preeyanon (MSU) and Alexis Black Pyrkosz (ADOL) did the work. All of the software discussed in this talk is available. This work was primarily supported by the USDA NIFA through a grant to me.
  • 3. Simulations show that incomplete gene reference => inaccurate differential expression from mRNAseq Single End Reads Paired End Reads % Transcripts Expressed Inaccurately (2-fold Difference) % Transcripts Expressed Inaccurately (2-fold Difference) 100% 100% 10 10 0% 0% 90% 90% ex ex pr pr 80% e ss 80% es io sio 75 n 75 n 70% % 70% % ex ex pre pre ss s 60% ion 60% sio n 50% 50% 50% expr 50% ex p essio ress n ion 40% 40% 30% 25% expressi 30% 25% expre on ssion 20% 20% 10% 10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% % Reference Completeness % Reference Completeness Alexis Black Pyrkosz
  • 4. Existing chick gene models lack exons, isoforms Our data Models *This gene contains at least 4 isoforms. Likit Preeyanon
  • 5. (Exon detection is pretty good.) Likit Preeyanon
  • 6. Different approaches to gene set prediction yield distinct splice junction predictions > 95% of thee assembly-based splice junctions are supported by 4 or more independent reads. Likit Preeyanon
  • 7. mRNAseq analysis with a combined de novo and genome-based approach. Likit Preeyanon
  • 8. We can produce combined gene models. Cufflinks (ref based) + de novo assembly + known mRNA
  • 9. Gene Model Summary (note: spleen mRNAseq) Method Gene Transcript Global Assembly 14,832 32,311 Local Assembly 15,297 23,028 Global + Local Assembly 15,934 46,797 *Number of genes and transcripts might be overdue to incomplete assembly and spurious splice junctions.
  • 10. Cross-validation with technical replicates Dataset Single-end Paired-end Mapped Unmapped Mapped Unmapped Line 6 uninfected 18,375,966 5,203,586 21,598,218 12,065,659 (77.93%) (22.07%) (64.16%) (35.84%) Line 6 infected 17,160,695 6,288,286 15,274,638 8633855 (73.18%) (26.82%) (63.89%) (36.11%) Line 7 uninfected 18,130,072 5,795,737 20,961,033 11,960,299 (75.77%) (24.22%) (63.67%) (36.33%) Line 7 infected 19,912,046 5,450,521 22,485,833 11,992,002 (78.51%) (21.49%) (65.22%) (34.78%) Single-ended reads were used to generate gene models; paired-end data was used as technical replicate cross-validation.
  • 11. Gene Modeler Pipeline (“gimme”)  Merge transcripts together based on transcript mapping to genome; can include existing gene predictions, & iteratively combine predictions.  Construct gene models  Remove redundant sequences  Predict strands and ORFs Likit Preeyanon
  • 12. Next problem: chick reference!  We like using the reference genome to scaffold RNAseq contigs; purely de novo RNAseq assembly is messy.  Genomes are also useful for other things, we hear. Problems:  Poor sensitivity: the chick genome is missing a substantial number of genes from microchromosomes: 723 genes from HSA19q missing from chicken galGal4. ESTs and RNAseq transcripts for many or most.  Gaps 9900 gaps on ordered chromosomes 21k gaps on chr-aligned but low-confidence/unaligned  Over-collapsed tandem dups and under-collapsed het
  • 13. Sensitivity – where is the problem? Are microchromosomes hard to sequence or is microchromosomal sequence hard to assemble? Sequences that simply don’t show up in the data are hard to include in the assembly… Unclonable (Sanger) Strong GC or AT bias Sequences with biased (generally low) coverage are often discarded by assemblers.
  • 14. Can we “even out” coverage? (Digital normalization) If you have two loci, or two mRNA species, with uneven coverage, can you remove the extra coverage?
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. Coverage before digital normalization: (MD amplified)
  • 22. Coverage after digital normalization: Normalizes coverage Discards redundancy Eliminates majority of errors Scales assembly dramatically. Assembly is 98% identical.
  • 23. Prelim results from digital normalization Reassembled chick genome contigs from 70x Illumina -> normalized reads in ~24 hours. Obtained 40 Mbp of assembled contigs that were not present in galGal4. Contig assembly contained partial or complete matches to 70% of previously unmappable transcripts assembled from chick spleen mRNAseq. ⇒Bioinformatics remedies may help but are probably not sufficient. Likit Preeyanon
  • 24. Can we improve the assembly? Read cleaning and improvement 1. Digital normalization evens out relative coverage, permitting recovery of difficult- to-sequence regions in assemblies. 2. Error correction and read-to-graph Selection of concordance editing collapses strategies and heterozygous regions. parameters 3. Paired-end de Bruijn graphs can be used to include long-distance constraints in primary contig assembly. 4. RNAseq data indicates contigs that can be combined into scaffolds. Assembly assessment 1. High-abundance k-mers present in the sequence data but missing from the assembly indicate poor sensitivity. 2. Discordant long-insert mate pairs Contig assembly indicate potentially erroneous contigs and and/or scaffolds. scaffolding 3. De novo RNAseq assembly can identify likely misassemblies and positively identify missing genomic sequence.
  • 25. slides from http://slideshare.net/flxlex/ ; Lex Nederbragt Longer reads! Repeat copy 1 Repeat copy 2 Long reads can span repeats and heterozygous regions Polymorphic contig 22 Polymorphic contig Contig 1 Contig 4 Polymorphic contig 33 Polymorphic contig
  • 26. slides from http://slideshare.net/flxlex/ ; Lex Nederbragt PacBio: first results (cod/salmon) Raw reads
  • 27. Cod: PacBio results Mapping to the published genome 11.4 kbp subread 10.6 kbp subread 10.9 kbp subread slides from http://slideshare.net/flxlex/ ; Lex Nederbragt
  • 28. Need to combine Illumina + PacBio still. P_errorCorrection pipeline from  93% of reads recovered 2.7x Alignments of at least 1kb to cod published assembly + Error-corrected reads 23x s + w rea d Ra 24 cpus 4.5 days 100 Gb RAM slides from http://slideshare.net/flxlex/ ; Lex
  • 29. Concluding thoughts/comments Gene models and reference genome both need work. This is going to be a continuing process… Together with Wes Warren (WUSTL), Hans Cheng (USDA ADOL), Jerry Dodgson (MSU) proposing to apply PacBio sequencing and digital normalization to improve chick genome and regularly integrate community improvements; should be generalizable approach. Questions? Contact me at: ctb@msu.edu