Sequencing and Analysis of  Monascus pilosus  genome 紅麴菌基因體定序與分析   食品工業發展研究所 生物資源保存及研究中心 陳倩琪 王俊霖 宋立民 邱世浩 邱祖培 廖麗玲 袁國芳 朱文深 廖啟成
Outline <ul><li>Genome Sequence </li></ul><ul><li>Genome Analysis </li></ul>
Genome Sequence
Gene Feature mRNA Genomic DNA Regulatory region exon intron Splicing
Sequencing Strategy for  Monascus cDNA library BAC/ fosmid library Expression sequence tag BAC/ fosmid end sequence Whole ...
Monascus  Genome sequence <ul><li>Genome sequence </li></ul><ul><ul><li>EST, expressed sequence tag </li></ul></ul><ul><ul...
EST EST(expressed sequence tag) 表現序列標幟 extraction mRNA reverse transcription mRNA cDNA replication ds cDNA end sequence 5‘...
Monasucs  EST *Tentative  Unigene=4,168 Contigs+2,719 Singletons *Tentative  Unigene=4,168 Contigs+2,719 Singletons Statis...
EST Assembly <ul><li>All ESTs were clustered by BLAST with 90% homology for nucleotide </li></ul><ul><li>All ESTs were ass...
Monasucs  EST *Tentative  Unigene=4,015 Contigs+3,473 Singletons Statistic of EST and Unigene 0 211 ND ND mpa02 -- + ++ ++...
Fungal ESTs in Public Source: TIGR Gene Indices, date of June 18, 2008 13,350 4,840 8,510 Gibberella fujikuroi (Gibberella...
Aspergillus nidulans  Genome <ul><li>Whole Genome Shotgun methodology </li></ul><ul><ul><li>DNA is shattered into small fr...
Monasucs  WGS  coverage   qualified reads    average read length    genome size BAC Library mpb01-02(80-100kb insert) ...
Genome Assembly Draft : gap allowed Finish : no gap and 0.01 % error rate Contig A Contig B SuperContig  Contig A Contig B...
Contigs Arranged in Order
Linage Information of Gap
Genome Assembly Draft  Arachne program licensed from MIT/Whitehead genome center
Evaluation of Assembly The N50 length is the length L such that 50% of the bases are contained in contigs/supercontigs of ...
Contig Coverage Statistics <ul><li>Coverage of genome </li></ul><ul><ul><li>total length of included sequence/ genome size...
Other Fungal Genome in Public Current Fungal Sequence Projects--44 candidates Source: Broad Institute of MIT&Harvard, date...
Genome Analysis
Genome Analysis <ul><li>Purpose </li></ul><ul><ul><li>Functional Annotation by Similarity </li></ul></ul><ul><ul><li>Exon ...
Expressed Gene Annotation 675 unigenes unknown undated to 2008/11/20
Highly Expressed Genes EST redundancy 0.75 % 0.90 % 0.97 % similar to HEAT SHOCK PROTEIN 90 HOMOLOG  MPUG00014394 1.66 % 0...
Intron and Exon
genome EST
Monascus  Introm
Monascus  Exon
Alternative Splicing (AS)
How To Merge the ESTs Alignment Pattern Monascus  genome Monascus  ESTs Merge same patterns to form 3 patterns
Monascus  Alternative Splicing <ul><li>513  gene locus exist alternative splicing </li></ul><ul><li>1,293  different trans...
Gene Prediction <ul><li>By  ab initio  method </li></ul><ul><li>Gene prediction by GlimmerHMM </li></ul>* indicates predic...
Gene Prediction
9997 predicted genes Gene Prediction
Gene ontology InterPro Gene product properties
Genome Annotation 891 genes unknown undated to 2008/11/20
Gene Feature
 
Lovastatin synthase gene cluster1
Lovastatin synthase gene cluster2
 
Polyketide related Gene in  Monascus
Polyketide related Gene in  Monascus
How to Access  Monascus  Database
Monascus  Genome Database
 
 
Acknowledgment <ul><li>Sponsor </li></ul><ul><ul><li>Minister of Economic Affairs (MOEA) </li></ul></ul><ul><li>Bioinforma...
Thanks for your attention
Citrinin Synthase vs Activator
Upcoming SlideShare
Loading in …5
×

20081216 06陳倩琪 紅麴菌基因體之定序與分析

3,080 views

Published on

20081216_06陳倩琪_紅麴菌基因體之定序與分析

Published in: Health & Medicine, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,080
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

20081216 06陳倩琪 紅麴菌基因體之定序與分析

  1. 1. Sequencing and Analysis of Monascus pilosus genome 紅麴菌基因體定序與分析 食品工業發展研究所 生物資源保存及研究中心 陳倩琪 王俊霖 宋立民 邱世浩 邱祖培 廖麗玲 袁國芳 朱文深 廖啟成
  2. 2. Outline <ul><li>Genome Sequence </li></ul><ul><li>Genome Analysis </li></ul>
  3. 3. Genome Sequence
  4. 4. Gene Feature mRNA Genomic DNA Regulatory region exon intron Splicing
  5. 5. Sequencing Strategy for Monascus cDNA library BAC/ fosmid library Expression sequence tag BAC/ fosmid end sequence Whole genome Shotgun library Genome draft finishing BAC/ fosmid clone BAC/ fosmid clone shotgun sequence Function assay Unigene Functional genome Shotgun sequence Annotation
  6. 6. Monascus Genome sequence <ul><li>Genome sequence </li></ul><ul><ul><li>EST, expressed sequence tag </li></ul></ul><ul><ul><li>WGS, whole genome sequencing </li></ul></ul><ul><ul><li>Assembly </li></ul></ul>
  7. 7. EST EST(expressed sequence tag) 表現序列標幟 extraction mRNA reverse transcription mRNA cDNA replication ds cDNA end sequence 5‘ 3‘ GATCGTCCTGCTAGAA TAGGCTTGGGTAACCT GTAACGTCCTAGCCCT Cell
  8. 8. Monasucs EST *Tentative Unigene=4,168 Contigs+2,719 Singletons *Tentative Unigene=4,168 Contigs+2,719 Singletons Statistic of EST 0 211 ND ND mpa02 -- + ++ +++++++ MK production 40,604 5,365 15,881 10,016 9,131 Qualified reads (Q20) + + ++++++ +++++ Pigment production 2,030 mpa05 1,369 mpa03 6,887 Tentative Unigene* 844 mpa08 1,471 mpa04 Contigs No. Library
  9. 9. EST Assembly <ul><li>All ESTs were clustered by BLAST with 90% homology for nucleotide </li></ul><ul><li>All ESTs were assembled by the CAP3 based on two or more ESTs that overlapped for at least 40 bases with at least 94% sequence identity. </li></ul>Assemble by CAP3 Contig (Unigene) EST1 EST2 Trim
  10. 10. Monasucs EST *Tentative Unigene=4,015 Contigs+3,473 Singletons Statistic of EST and Unigene 0 211 ND ND mpa02 -- + ++ +++++++ MK production 40,604 5,365 15,881 10,016 9,131 Qualified reads (Q20) + + ++++++ +++++ Pigment production 2,030 mpa05 1,369 mpa03 7,488 Tentative Unigene* 844 mpa08 1,471 mpa04 Contigs No. Library
  11. 11. Fungal ESTs in Public Source: TIGR Gene Indices, date of June 18, 2008 13,350 4,840 8,510 Gibberella fujikuroi (Gibberella moniliformis) 36,471 24,322 12,149 Alternaria solani Potato_late_blight (new in 2008) 7,488 3,473 4,015 Monascus pilosus 5,933 3,484 2,449 Schizosaccharomyces pombe 6,310 2,203 4,107 Saccharomyces cerevisiae 3,569 15,552 3,290 7,810 9,894 4,111 EST Singleton 10,927 11,432 2,384 6,893 3,662 4,026 EST Contig 5,674 Cryptococcus sp. ( Filobasidiella neoformans ) 13,556 Aspergillus nidulans 14,496 Neurospora crassa 26,984 Magnaporthe grisea 14,703 Coccidioides posadasii 8,137 Aspergillus flavus Tentative Unigene Species
  12. 12. Aspergillus nidulans Genome <ul><li>Whole Genome Shotgun methodology </li></ul><ul><ul><li>DNA is shattered into small fragments (~4 kb or ~40 kb) </li></ul></ul><ul><ul><li>Each fragment is inserted into a vector and cloned </li></ul></ul><ul><ul><li>The two ends of the fragment are sequenced, creating paired reads </li></ul></ul><ul><li>Arachne Assembly methodology </li></ul><ul><ul><li>The assembly process uses the paired reads to identify contiguous stretches of sequence (contigs) </li></ul></ul><ul><ul><li>Contigs are ordered and linked together into larger supercontigs by using paired reads lying in different contigs </li></ul></ul>
  13. 13. Monasucs WGS  coverage  qualified reads  average read length  genome size BAC Library mpb01-02(80-100kb insert) Fosmid Library mpf01(30-40kb insert) Plasmid Library mpg01-12(2.5-3.5kb insert), 31-34(4.5-7.5kb insert), 61-64(8-10kb insert)
  14. 14. Genome Assembly Draft : gap allowed Finish : no gap and 0.01 % error rate Contig A Contig B SuperContig Contig A Contig B Gap
  15. 15. Contigs Arranged in Order
  16. 16. Linage Information of Gap
  17. 17. Genome Assembly Draft  Arachne program licensed from MIT/Whitehead genome center
  18. 18. Evaluation of Assembly The N50 length is the length L such that 50% of the bases are contained in contigs/supercontigs of size at least L. N50 length is 224.5Kb, that is 50 % of all bases are contained in contigs of at least 224.5Kb 15,247,465 2,526,505 MPSC 5 2,676,992 MPSC013004 4 2,915,166 MPSC013003 3 3,562,398 MPSC013002 2 3,566,104 MPSC013001 1 Total (bp) ID Size (bp) No. N50 length =2,527Kb Supercontig 13,333,461 224,531 MPGC00020854 39 391,998 MPGC00020775 12 395,331 MPGC00020741 11 403,969 MPGC00020828 10 425,346 MPGC00020790 9 432,590 MPGC00020697 8 433,630 MPGC00020902 7 517,965 MPGC00020893 6 520,988 MPGC00020776 5 564,379 MPGC00020911 4 578,736 MPGC00020728 3 596,733 MPGC00020904 2 623,226 MPGC00020726 1 Total (bp) ID Size (bp) No. N50 length =224.5Kb Contig
  19. 19. Contig Coverage Statistics <ul><li>Coverage of genome </li></ul><ul><ul><li>total length of included sequence/ genome size </li></ul></ul><ul><ul><li>Contigs of total length 26,428,892 bp </li></ul></ul><ul><li>Coverage of known sequence </li></ul><ul><ul><li>total length of included sequence/ known sequence* </li></ul></ul><ul><ul><li>* known sequence consists 13 BAC contigs of total length 1,256,569 bp </li></ul></ul>99.05% 37 97.88% 709 all contigs and gaps coverage of known sequence contigs align to known sequence* coverage of genome contigs sequence included coverage statistics
  20. 20. Other Fungal Genome in Public Current Fungal Sequence Projects--44 candidates Source: Broad Institute of MIT&Harvard, date of Dec 12, 2008 Pyrenophora tritici-repentis Puccinia graminis Paracoccidioides brasiliensis Neurospora crassa Magnaporthe grisea Lodderomyces elongisporus Histoplasma capsulatum Fusarium verticillioides Fusarium oxysporum Fusarium graminearum Cryptococcus neoformans Coprinus cinereus Species Ustilago maydis Uncinocarpus reesii Stagonospora nodorum Schizosaccharomyces pombe Schizosaccharomyces japonicus Saccharomyces cerevisiae Sclerotinia sclerotiorum Rhizopus oryzae Species Coccidioides posadasii Candida lusitaniae Candida guilliermondii Coccidioides immitis Chaetomium globosum Candida tropicalis Batrachochytrium dendrobatidis Aspergillus terreus Candida albicans Botrytis cinerea Aspergillus terreus Aspergillus nidulans Species
  21. 21. Genome Analysis
  22. 22. Genome Analysis <ul><li>Purpose </li></ul><ul><ul><li>Functional Annotation by Similarity </li></ul></ul><ul><ul><li>Exon and Intron Information </li></ul></ul><ul><ul><li>Alternative Splicing </li></ul></ul><ul><ul><li>Genome Prediction </li></ul></ul>
  23. 23. Expressed Gene Annotation 675 unigenes unknown undated to 2008/11/20
  24. 24. Highly Expressed Genes EST redundancy 0.75 % 0.90 % 0.97 % similar to HEAT SHOCK PROTEIN 90 HOMOLOG MPUG00014394 1.66 % 0.12 % 0.33 % weakly similar to 30 KD HEAT SHOCK PROTEIN MPUG00016317 0.86 % 1.62 % 0.99 % similar to Heat shock 70 kDa protein MPUG00015777 0.51 % 2.33 % 1.44 % weakly similar to Phosphoenolpyruvate carboxykinase [ATP] MPUG00013548 2.04% 0.53% 1.15% homologue to Elongation factor 1-alpha (EF-1-alpha) MPUG00015614 mpa05 mpa04 mpa03 2.10 % 1.20 % 1.26 % homologue to Glyceraldehyde 3-phosphate dehydrogenase MPUG00015767 Percentage of Library Annotation Cluster_ID
  25. 25. Intron and Exon
  26. 26. genome EST
  27. 27. Monascus Introm
  28. 28. Monascus Exon
  29. 29. Alternative Splicing (AS)
  30. 30. How To Merge the ESTs Alignment Pattern Monascus genome Monascus ESTs Merge same patterns to form 3 patterns
  31. 31. Monascus Alternative Splicing <ul><li>513 gene locus exist alternative splicing </li></ul><ul><li>1,293 different transcripts observed </li></ul>1 2 3 4
  32. 32. Gene Prediction <ul><li>By ab initio method </li></ul><ul><li>Gene prediction by GlimmerHMM </li></ul>* indicates prediction qualities for Aspergillus fumigatus extracted from reference: Bioinformatics , 20: 2878 – 2879 (2004). 0.42 0.90 NA A. fumigatus * 0.72 0.96 9,997 M. pilosus BCRC38072 Exon level Nucleotide level accuracy Predicted genes Predicted genome
  33. 33. Gene Prediction
  34. 34. 9997 predicted genes Gene Prediction
  35. 35. Gene ontology InterPro Gene product properties
  36. 36. Genome Annotation 891 genes unknown undated to 2008/11/20
  37. 37. Gene Feature
  38. 39. Lovastatin synthase gene cluster1
  39. 40. Lovastatin synthase gene cluster2
  40. 42. Polyketide related Gene in Monascus
  41. 43. Polyketide related Gene in Monascus
  42. 44. How to Access Monascus Database
  43. 45. Monascus Genome Database
  44. 48. Acknowledgment <ul><li>Sponsor </li></ul><ul><ul><li>Minister of Economic Affairs (MOEA) </li></ul></ul><ul><li>Bioinformatics </li></ul><ul><ul><li>Bioinformatic group of BCRC/FIRDI </li></ul></ul><ul><li>Sequencing </li></ul><ul><ul><li>The Sequencing Core Facility of National Yang Ming University Genome Center (YMGC) </li></ul></ul><ul><ul><li>Sequencing service of VitaGenomics </li></ul></ul><ul><ul><li>Sequence group of BCRC/FIRDI </li></ul></ul>
  45. 49. Thanks for your attention
  46. 50. Citrinin Synthase vs Activator

×