SlideShare a Scribd company logo
1 of 45
A
C
G
T
TheThe MedicagoMedicago truncatulatruncatula genome:genome:
a progress reporta progress report
Dr. Bruce A. RoeDr. Bruce A. Roe
Advanced Center for Genome TechnologyAdvanced Center for Genome Technology
Department of Chemistry and BiochemistryDepartment of Chemistry and Biochemistry
University of OklahomaUniversity of Oklahoma
broe@ou.edu www.genome.ou.edubroe@ou.edu www.genome.ou.edu
Plant and Animal GenomePlant and Animal Genome
San Deigo January 11San Deigo January 11, 2004, 2004
Photos by Steve Hughes, Genetic Resource Centre (PIRSA-SARDI), Adelaide, Australia.
http://www.fao.org/ag/AGP/AGPC/doc/gallery/pictures/meditrunc/meditrunc.htm
A
C
G
T
• An important forage crop
• A genetically tractable model legume
• A relatively small (~500 Mbp) diploid genome
• Active legume research community
• Medicago Research Consortium
• Large collection of ESTs
• Excellent BAC library
• Integrated physical and genetic map
• Large number of BAC-end sequences
Why sequence the Medicago genome?
A
C
G
T
DNA GenBank
Sequence Pipeline at the University of Oklahoma
Genome Center, OU-ACGT
DNA shearing
(HydroshearTM
)
Colony Piking
(QPixIITM
)
Growing subclones
(HiGroTM)
Subclone Isolation I
(Mini-StaccatoTM
)
Subclone isolation II
(VPrepTM
)
Thermocycling
(ABI 9700)
Sequencing
(ABI 3700)
Data assembly and
Analysis
Primer
Synthesis
Miscelaneous liquid
handling
Closure
A
C
G
T
• This Zymark robot has 384 cannula array, four built in shakers, three
attached storage racks, built-in barcoding and a Twister II robotic arm.
• This automation has allow us to perform the DNA isolation completely
unattended from as many as eighty 384 well plates of bacterial cells per
Subclone Isolation (Mini-StaccatoTM
)
QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.
A
C
G
T
• Once all three solutions have been added, the plates are transferred from
the SciClone workspace deck to a storage rack by the Twister II robotic arm.
Subclone Isolation (Mini-StaccatoTM
)
QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.
A
C
G
T
• Liquid handling station with 384-channel pipettor head
• Four movable shelves on either side of the pipettor head
• Used for subclone isolation, sequencing reaction set-up and clean-up.
Subclone Isolation and Sequencing Reaction
Pipetting (Velocity 11 VPrep)
QuickTime™ and a
YUV420 codec decompressor
are needed to see this picture.
A
C
G
T
Data assembly and Analysis
32 GB RAM running Solaris 8 OS and 3
TB of data stored on RAID-5 arrays
with autoloader tape backup
Also:
• 12 workstations each with 1 GB RAM
Sun V880 server Phred/Phrap/Consed
Exgap
A
C
G
T
Initial WGS Skimming for ~500 Mb
Medicago truncatula genome
• Collected ~25,000 end-sequences from ~12,500
plasmid-based WGS clones.
• Of these ~25,000 sequences, ~1,000 have
homology with Medicago truncatula ESTs.
• URL:
http://www.genome.ou.edu/medicago.html
A
C
G
T
Phrap assembly of our Medicago truncatula whole
genome shotgun survey sequencing data
at 0.005-fold genomic sequence coverage
A
C
G
T
DotPlot of a Phrap assembled whole genome
shotgun contig showing multiple repeated regions
0 100 200 300 400 500 600 700
7006005004003002001000
Bases
Bases
A
C
G
T
DotPlot of a Phrap assembled whole genome shotgun
contig showing 4 repeated blocks of ~600 bases
0 500 1000
10005000Bases
Bases
A
C
G
T
Yet another genomic contig showing extensive repeated regions
Contig 1931
0 200 400 600
6004002000
Bases
Bases
A
C
G
T
>Contig1931
TTTACGTCCCCGTAGTGAACTATTTCCTAAGTTGACTAGTCAATTAGGTG
ATAGTTCGTCCGGATGACGTACCGCCGTGAACCCGATATGAGAATTTCAT
GTGGTGCATCCTTCTATGTTTGATAAGGTCATTTTGAACGGTCGGATTGA
ACGTGGCTGGTGTCGTTCACGATAGAGGCACGTTTAGGTCCCTACGGTGA
ACTAGTTCCTAAGTTGACTAGTCAATTAGGTGATAGTTTGTCCGGATGAC
GTACCTCCGTGAACCCGATCTGAGAAATTCAAGTTTCTGCATCCTTCTAT
GTTTGATAAGGTCATTTTGAACGGTCGGATTGAAGGTGGCTGGTGTTCTT
CACATTCTAGGCACGTTTAGGTTCCCGCGGTGAACTAGTTCCTAAGTTGA
CTAGTCAATTAGGTGATAGTTCGTCCGGATGACCTACCTCCGTGAACCCG
ATATTAGAAATTCAAGTTTCTGCATCCTTCTATGTTTGATAAGGTCATTT
TGAACGGTCAGATTGAACGTGGCTGGTGTCGTTCACGATCTAGGCACGTT
TAGGTCCCCGCAGTGAACTAGTTCCTAAGTTGACTAGTCAATTAGGTGAT
AGTTTGTCCGGATGACGTGACTCCGTAAAGCCAGTATGAGAACTTCTAGT
TTCTGCATCCTTTTATGTTTGATAAGGTCATTTTGAACGGTGGGATTGAA
CGTTGTTGGTGTCGTTCACGATCTAGGCACGTTTAGGTCCCCGCAGTGAA
CTAGTTCCTTAGTTGACTAGTCAATTAGGTGATAGTTCGTCCGGATGACG
TATCTCCGTCAGCCCGATCTGAGAAATTCAAATTTCTGCATCCTTCTATG
TTTGATAAGGTCATTTTGAACGGTCGGATTGAACGTGGCTGGTGTCGTGC
ACGATCAAGGCACGTTTAGGTCCCCGCAGCGAACTAGTTCCTAAGTTGAC
TAGTCAATTAGGTGATACCTTGTCCGGATGACGTACCTCCGTGAACCCGA
TCTGAGAAATTCAAGTTTCTGCATCCTTCTATGTTTGATAAGGTCATTTT
GAACGGTTGGATTGAACATGGCTGGTGTCGTTCACGATCTAGGCACGTTT
AGGTCCCCGCAGTGAACTAGTTCCTAAGTTGACTAGTCAATTAGGTGATA
GTTCGTCTGGATGACGTACCTCCTTGAACCCAATATGAGAAATTCAATTT
TCTTCATCCTTCTATGTTTGATAAGGTCATTTTGAACGGTCGGATTGAAC
GTGCCTGGTGTCGTTCACGATCGAGGCACGTTTAGGTCCCCGCAGTGAAC
. . .
A
C
G
T
Summary of our Medicago truncatula WGS
Sequencing Assembly with only 0.005-fold
Genomic Sequence Coverage
• The largest contig (21,157 bp) contained the 26S
rRNA genes
• 19 smaller contigs (105,455 bp total) were from the
chloroplast genome
• The remaining ~500 contigs, ranging in size from
2,000 to 12,000 bp contain highly repetitive DNA,
which were unique to Medicago, as they had no
significant homology in the GenBank database
• We concluded that a more directed strategy was
needed
A
C
G
T
Mapped BAC approach in
collaboration with Doug Cook
and DJ Kim at U.C. Davis with
funding from the Noble
Foundation, Ardmore, OK
A
C
G
T
The first ~1000The first ~1000 Medicago truncatulaMedicago truncatula BACsBACs
• Initially concentrated on BACs with known biologicalInitially concentrated on BACs with known biological
markers and in regions of biological interest that weremarkers and in regions of biological interest that were
supplied to us by the UC Davis group.supplied to us by the UC Davis group.
• Requests for sequencing specific BACs were directedRequests for sequencing specific BACs were directed
to Doug Cook and DJ Kim at UC Davis and theyto Doug Cook and DJ Kim at UC Davis and they
supplied us with the BACs once these BACs havesupplied us with the BACs once these BACs have
been characterized.been characterized.
• Once the BACs were received, we created the shotgunOnce the BACs were received, we created the shotgun
libraries, isolated the sequencing templates andlibraries, isolated the sequencing templates and
obtained the working draft sequence followed byobtained the working draft sequence followed by
closure and finishing.closure and finishing.
• All data was made publically available in GenBankAll data was made publically available in GenBank
within 24 hours of sequence assembly.within 24 hours of sequence assembly.
A
C
G
T
UC Davis
--------
Oklahoma
University
A
C
G
T
Medicago BAC Sequencing
0
10000000
20000000
30000000
40000000
50000000
60000000
70000000
80000000
90000000
100000000
4/15/02
6/15/02
8/15/02
10/15/02
12/15/02
2/15/03
4/15/03
6/15/03
8/15/03
10/15/03
12/15/03
Date
NumberofBases
Phase 1
Phase 2
Phase 3
Total
A
C
G
T
The next ~750The next ~750 Medicago truncatulaMedicago truncatula BACsBACs
• With recent NSF funding, we will beWith recent NSF funding, we will be
sequencing BACs from chromosomessequencing BACs from chromosomes
1,4, 6, and 8 with the goal of completing1,4, 6, and 8 with the goal of completing
the sequence of the euchromatic regionsthe sequence of the euchromatic regions
of these chromosomes over the next 3of these chromosomes over the next 3
years.years.
• Chromosomes 2 and 7 will be sequencedChromosomes 2 and 7 will be sequenced
at TIGR, chromosome 3 at The Sangerat TIGR, chromosome 3 at The Sanger
Institute and and chromosome 5 atInstitute and and chromosome 5 at
Genoscope.Genoscope.
• All data will be released immediately asAll data will be released immediately as
before.before.
A
C
G
T
www.genome.ou.edu/medicago.html
A
C
G
T
www.genome.ou.edu/medicago_totals.html
A
C
G
T
Medicago-specific gene with
ESTs but no known homology
Gene density of this BAC is ~1 gene per 10 kb
A
C
G
T
Medicago-specific gene with ESTs but no known homology
A
C
G
T
myosin-like protein
Gene density ~1 gene per 10 kb
A
C
G
T
myosin-like protein
A
C
G
T
A
C
G
T
Gene Size Distribution (All Sequence Data)
(FgenesH vs. Genscan)
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1-1000
1001-2000
2001-3000
3001-4000
4001-5000
5001-6000
6001-7000
7001-8000
8001-9000
9001-10000
10001-11000
11001-12000
12001-13000
13001-14000
14001-15000
15001-16000
16001-17000
17001-18000
18001-19000
19001-20000
20001-above
FgeneSH
Genscan
Number
of
Genes
Gene Size Range
13,396 FgeneSH predicted genes
11,488 Genscan predicted genes
A
C
G
T
Exon Size Distribution (All Sequence Data)
(FgenesH vs. Genscan)
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
1-50
51-100
101-200
201-300
301-400
401-500
501-600
601-700
701-800
801-900
901-1000
1001-1500
1501-2000
2001-2500
2501-3000
3001-3500
3501-4000
Number
of
Exons
Exon Size Range
FgeneSH
Genscan
59,808 FgeneSH predicted exons
55,792 Genscan predicted exons
A
C
G
T
Intron Size Distribution (All Sequence Data)
(FgenesH vs. Genscan)
0
2000
4000
6000
8000
10000
12000
1-50
51-100
101-200
201-300
301-400
401-500
501-600
601-700
701-800
801-900
901-1000
1001-1500
1501-2000
2001-2500
2501-3000
3001-3500
3501-4000
Number
of
Introns
Intron Size Range
FgeneSH
Genscan
46,412 FgeneSH predicted introns
44,305 Genscan predicted introns
A
C
G
T
FgeneSH Genscan
Total number of genes 13,397 11,488
Total length of genes 30,793,326 51,687,528
Total exon length 15,794,243 14,400,445
Total number of exons 59,808 55,792
Total intron length 14,999,083 37,287,083
Total number of introns 46,412 44,305
_______________________________________________________
Base Pairs Sequenced 87,423,457 87,423,457
_______________________________________________________
Gene Space
(Gene Length/BP Sequenced) 35% 59%_______________________________________________________
Gene Density (Genes/200Mb) 30,649 26,281
1 gene/6.5 kb 1 gene/7.6 kb_______________________________________________________
Arabidopsis 25,498 protein coding genes
Gene Density of the ~450 Mb Medicago truncatula genome
A
C
G
T
Medicago GC Content for ~90 Mb of Genomic BAC
Clones Sequenced (mainly from gene rich regions)
A
C
G
T
Metabolic Overview of Medicago
13,396 FgeneSH predicted genes using the COG Database
DNA Metabolism
23%
Cellular Processes
23%Metabolism
24%
Poorly
Characterized
17%
No Hits
5%
Multiple COG Hits
8%
A
C
G
T
Metabolic Overview (detailed view) of Medicago
13,396 FgeneSH predicted genes using the COG Database
No Hits
5%
Translation, ribosomal
structure & biogenesis
7% Transcription
5%
DNA replication,
recombination & repair
11%
Multiple COG Hits
8%
Poorly Characterized
17%
Cell division &
chromosome
partitioning 2%
Posttranslational
modification, protein
turnover, chaperones 5%
Cell envelope
biogenesis, outer
membrane 4%
Cell motility & secretion 3%
Inorganic ion transport &
metabolism 3%
Signal
transduction
mechanisms 5%Energy production &
conversion 5%
Carbohydrate transport &
metabolism 4%
Amino acid transport
& metabolism 5%
Nucleotide transport &
metabolism 2%
Coenzyme metabolism 2%
Lipid metabolism 2%
Secondary metabolites
biosynthesis, transport &
catabolism 3%
A
C
G
T
Gene Duplication: Three copies of the phosphoglycerate
kinase gene in one BAC
A
C
G
T
AC138448.fg.10 MATKRSVGTLKEAELKGKRVFVRVDLNVPLDDNLNITDDTRIRAAVPTIKYLTGYGAKVILSSHL-----
AC138448.fg.11 MA-KKSVGDLSGAELKGKKVFVRADLNVPLDDNQNITDDTRIRAAIPTIKYLIQNGAKVILSSHL-----
AC138448.fg.8 MATKRSVGTLKEGELKGKRVFVRVDLNVPLDDNLNITDDTRIRAAVPTIKYLTGYGAKVILSSHLEIYKT
AC138448.fg.10 ------------------------------------------GRPKGVTPKYSLKPLVPRLSELLGTQVK
AC138448.fg.11 ------------------------------------------GRPKGVTPKYSLAPLVPRLSELIGIEVI
AC138448.fg.8 EVSVSEYNLAVSEYKLAISDTYRYRIRVRHDSSPFLEYRGSQGRPKGVTPKYSLKPLVPRLSELLETQVK
AC138448.fg.10 IADDSIGEEVEKLVAQIPEGGVLLLENVRFHKEEEKNDPEFAKKLASLADLYVNDAFGTAHRAHASTEGV
AC138448.fg.11 KAEDSIGPEVEKLVASLPDGGVLLLENVRFYKEEEKNDPEHAKKLAALADLYVNDAFGTAHRAHASTEGV
AC138448.fg.8 ISDDCIGEEVEKLVAQIPEGGVLLLENVRFHKEEEKNEPEFAKKLASLADLYVNDAFGTAHRAHASTEGV
AC138448.fg.10 AKYLKPSVAGFLMQKELDYLVGAVSNPKKPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIFTFYKA
AC138448.fg.11 TKYLKPSVAGFLLQKELDYLVGAVSSPKRPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIFTFYKA
AC138448.fg.8 AKYLKPSVAGFLMQKELDYLVGAVSNPKKPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIYTFYKA
AC138448.fg.10 QGYAVGSSLVEEDKLDLATTLIEKAKAKGVSLLLPTDVVIADKFAADANDKIVPASSIPDGWMGLDIGPD
AC138448.fg.11 QGLAVGSSLVEEDKLELATTLIAKAKAKGVSLLLPSDVVIADKFAPDANSQIVPASAIPDGWMGLDIGPD
AC138448.fg.8 QGYSIGSSLVEEDKLDLATSLMEKAKAKGVSLLLPTDVVIADKFSADANDKIVPASSIPDGWMGLDIGPD
AC138448.fg.10 SIKTFNEALDKSQTIIWNGPMGVFEFDKFAAGTEAIAKKLAEVSGKGVTTIIGGGDSVAAVEKVGLADKM
AC138448.fg.11 SIKTFNEALDTTQTIIWNGPMGVFEFDKFAVGTESIAKKLADLSGKGVTTIIGGGDSVAAVEKVGVADVM
AC138448.fg.8 SIKTFNEALDKSQTIIWNGPMGVFEFDKFAAGTEAIAKKLAEVSGKGVTTIIGGGDSVAAVEKVGLADKM
AC138448.fg.10 SHISTGGGASLELLEGKPLPGVLALDDA* 401 amino acids
AC138448.fg.11 SHISTGGGASLELLEGKELPGVLALDEATPVAV* 405 amino acids, differs at 42 positions
AC138448.fg.8 SHISTGGGASLELLEGKPLPGVLALDDA* 448 amino acids, differs at 6 positions
Gene Duplication: Three copies of phosphoglycerate kinase in one BAC
A
C
G
T
Printrepeat Analysis of
M. truncatula BAC AC121240 vs. A. thaliana Chr.2
Expansion, Duplication, Repeat Elements
~5 kb region
~25 kb region
A
C
G
T
PIP of M. truncatula BAC AC121240 vs. A. thaliana Chr.2
A
C
G
T
Medicago truncatulaMedicago truncatula
Summary and ConclusionsSummary and Conclusions
• Average Predicted Gene Density of 1 gene per 6.5 toAverage Predicted Gene Density of 1 gene per 6.5 to
7.6 Kb by FgeneSH and Genscan, respectively.7.6 Kb by FgeneSH and Genscan, respectively.
• Genome characteristics such as %GC, intron/exonGenome characteristics such as %GC, intron/exon
size and conserved unique 5’ splice sites revealsize and conserved unique 5’ splice sites reveal
Medicago characteristicsMedicago characteristics
• The sequence of theThe sequence of the Medicago truncatulaMedicago truncatula genomegenome
shows homology to the sequencedshows homology to the sequenced ArabidopsisArabidopsis
thalianathaliana genome but expansion, rearrangementsgenome but expansion, rearrangements
and duplications are evident.and duplications are evident.
A
C
G
T
Data Release and Preliminary AnnotationData Release and Preliminary Annotation
• All our sequence data is available through links on ourAll our sequence data is available through links on our
web site to GenBank and on our ftp site at URL:web site to GenBank and on our ftp site at URL:
ftp.genome.ou.edu/medicagoftp.genome.ou.edu/medicago
• keyword and blast searches can be done on our web sitekeyword and blast searches can be done on our web site
at URL:at URL: http://www.genome.ou.edu/medicago.htmlhttp://www.genome.ou.edu/medicago.html
• Additional annotation via Genome Browser databaseAdditional annotation via Genome Browser database
are available on our web site at URL:are available on our web site at URL:
http://www.genome.ou.edu/medicago_table.htmlhttp://www.genome.ou.edu/medicago_table.html
• E-mail suggestions for additional annotation to BruceE-mail suggestions for additional annotation to Bruce
Roe at: broe@ou.eduRoe at: broe@ou.edu
A
C
G
T
Three Year PlanThree Year Plan
• Obtain the contiguous sequence of the GeneObtain the contiguous sequence of the Gene
Rich regions of four of the 8Rich regions of four of the 8 Medicago truncatulaMedicago truncatula
genome at OU, with the remaining four beinggenome at OU, with the remaining four being
completed by our international partners at TIGR,completed by our international partners at TIGR,
Sanger, and Genoscope.Sanger, and Genoscope.
• This information will serve as a solid foundationThis information will serve as a solid foundation
for anticipated comparative and functionalfor anticipated comparative and functional
legume genomics.legume genomics.
A
C
G
T
Laboratory OrganizationLaboratory Organization
Bruce Roe, PIBruce Roe, PI
InformaticsInformatics
Support TeamsSupport Teams
ProductionProduction AdministrationAdministration
Jim WhiteJim White
Steve KentonSteve Kenton
Hongshing LaiHongshing Lai
Sean QianSean Qian
Rose Morales-Diaz*Rose Morales-Diaz*
Mounir Elharam*Mounir Elharam*
Yonas TesfaiYonas Tesfai
Steve Shaull**Steve Shaull**
Doug WhiteDoug White
Work-study Undergraduates**Work-study Undergraduates**
Kay Lynn HaleKay Lynn Hale
Dixie WishnuckDixie Wishnuck
Tami WomackTami Womack
Mary Catherine WilliamsMary Catherine Williams
DNA SynthesisDNA Synthesis
Phoebe Loh*Phoebe Loh*
Sulan QiSulan Qi
Bart Ford*Bart Ford*
Reagents &Reagents &
Equip. Maint.Equip. Maint.
Mounir Elharam*Mounir Elharam*
Doug WhiteDoug White
Axin HuaAxin Hua
Weihong XuWeihong Xu
Jami MilamJami Milam
Sara Downard**Sara Downard**
Limei YangLimei Yang
Angie Prescott*Angie Prescott*
Audra Wendt**Audra Wendt**
Mandi Aycock**Mandi Aycock**
Ziyun YaoZiyun Yao
Steve Shaull*Steve Shaull*
Youngju YoonYoungju Yoon
Trang DoTrang Do
Anh DoAnh Do
Lily FuLily Fu
Yang YeYang Ye
James YuJames Yu
Tessa Manning**Tessa Manning**
Fu YingFu Ying
Liping ZhouLiping Zhou
Ruihua ShiRuihua Shi
Junjie WuJunjie Wu
Stephan DeschampsStephan Deschamps
Shelly OommenShelly Oommen
Christopher LauChristopher Lau
Yanhong LiYanhong Li
Research TeamsResearch Teams
Doris KupferDoris Kupfer
Julia Kim*Julia Kim*
Sun SoSun So
Graham Wiley**Graham Wiley**
Lauren Ritterhouse**Lauren Ritterhouse**
Lin SongLin Song
Ying NiYing Ni
Huarong JiangHuarong Jiang
ShaoPing LinShaoPing Lin
Honggui JiaHonggui Jia
Hongming WuHongming Wu
Baifang QinBaifang Qin
Peng ZhangPeng Zhang
Fares NajarFares Najar
Chunmei QuChunmei Qu
Keqin WangKeqin Wang
Carson QuCarson Qu
Shuling LiShuling Li
Funding from the Noble Foundation, DOE, and NSF
Collaborators at Univ. Minnesota, UC Davis, TIGR,
Sanger, Genoscope, and the Noble Foundation
Pheobe LohPheobe Loh **
Sulan QiSulan Qi
Bart Ford*Bart Ford*
* Previous undergraduate* Previous undergraduate
research studentresearch student
** Present undergraduate** Present undergraduate
research studentresearch student
A
C
G
T
The AACCGGTT Team
A
C
G
T
A
C
G
T
Conserved Intron/Exon Boundry Features by a FELINEs**
Analysis of 181,444 Medicago truncatula ESTs in GenBank
vs Genomic Sequence
Size Range Mean Length
Exons 6 - 5,789 nt 268 nt
Introns 20 - 3,921 nt 429 nt
Intron Conserved Splice Site Sequence Elements Percent
Introns w/ 5’ GU 99.21%
Introns w/ 5’ GC 0.36%*
Introns w/ 5’ AU 0.31%
Introns w/ U12 branch sites instead of A12 0.13%
*Compared to 0.5 - 2.5% in fungi, and 0.5% in mammals with an EST minimum identity
of 90%
** S. Drabensctot, D. Kupfer, J. White, D. Dyer, B. Roe, K. Buchanan and J. Murphy.
FELINES: A Utility for Extracting and Examining EST-Defined Introns and Exons.
Nucleic Acid Research 31(22), E141 (2003).
A
C
G
T
Consensus Logogram of the 5’GU vs the 5’AU Class of Introns
in Medicago truncatula determined by FELINES
AU intron consensus
GU intron consensus

More Related Content

What's hot

Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...
Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...
Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...shabeel pn
 
Genome sequencing and the development of our current information library
Genome sequencing and the development of our current information libraryGenome sequencing and the development of our current information library
Genome sequencing and the development of our current information libraryZarlishAttique1
 
GIAB Sep2016 Lightning mason chris_epi_qc
GIAB Sep2016 Lightning mason chris_epi_qcGIAB Sep2016 Lightning mason chris_epi_qc
GIAB Sep2016 Lightning mason chris_epi_qcGenomeInABottle
 
2015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and22015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and2Dan Gaston
 
Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han caoGenomeInABottle
 
Sept2016 newsample cancer_craig
Sept2016 newsample cancer_craigSept2016 newsample cancer_craig
Sept2016 newsample cancer_craigGenomeInABottle
 
Rice genome sequencing by utkarsh
Rice genome sequencing by utkarshRice genome sequencing by utkarsh
Rice genome sequencing by utkarshutkarsh2011
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.Jennifer Shelton
 
Credit seminar on rice genomics crrected
Credit seminar on rice genomics crrectedCredit seminar on rice genomics crrected
Credit seminar on rice genomics crrectedVarsha Gayatonde
 
RNA-Seq Analysis of Blueberry Fruit Development and Ripening
RNA-Seq Analysis of Blueberry Fruit Development and RipeningRNA-Seq Analysis of Blueberry Fruit Development and Ripening
RNA-Seq Analysis of Blueberry Fruit Development and RipeningAnn Loraine
 
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...Ann Loraine
 
Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Christian Frech
 
Institute of Learning in Retirement - Miami University (Ohio)
Institute of Learning in Retirement - Miami University (Ohio)Institute of Learning in Retirement - Miami University (Ohio)
Institute of Learning in Retirement - Miami University (Ohio)Andor Kiss
 
Overview on arabidopsis and rice genome
Overview on arabidopsis and rice genomeOverview on arabidopsis and rice genome
Overview on arabidopsis and rice genomeGopal Singh
 

What's hot (20)

Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...
Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...
Genetic and Molecular Characterization of a Dental Pathogen Using a Genome-Wi...
 
Bioinformatics at IITA
Bioinformatics at IITABioinformatics at IITA
Bioinformatics at IITA
 
Genome sequencing and the development of our current information library
Genome sequencing and the development of our current information libraryGenome sequencing and the development of our current information library
Genome sequencing and the development of our current information library
 
GIAB Sep2016 Lightning mason chris_epi_qc
GIAB Sep2016 Lightning mason chris_epi_qcGIAB Sep2016 Lightning mason chris_epi_qc
GIAB Sep2016 Lightning mason chris_epi_qc
 
2015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and22015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and2
 
Jan2016 bio nano han cao
Jan2016 bio nano han caoJan2016 bio nano han cao
Jan2016 bio nano han cao
 
CSHL
CSHLCSHL
CSHL
 
Plant genome project
Plant genome projectPlant genome project
Plant genome project
 
Genome editing
Genome editingGenome editing
Genome editing
 
150224 grc kms
150224 grc kms150224 grc kms
150224 grc kms
 
Sept2016 newsample cancer_craig
Sept2016 newsample cancer_craigSept2016 newsample cancer_craig
Sept2016 newsample cancer_craig
 
Rice genome sequencing by utkarsh
Rice genome sequencing by utkarshRice genome sequencing by utkarsh
Rice genome sequencing by utkarsh
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
 
Credit seminar on rice genomics crrected
Credit seminar on rice genomics crrectedCredit seminar on rice genomics crrected
Credit seminar on rice genomics crrected
 
RNA-Seq Analysis of Blueberry Fruit Development and Ripening
RNA-Seq Analysis of Blueberry Fruit Development and RipeningRNA-Seq Analysis of Blueberry Fruit Development and Ripening
RNA-Seq Analysis of Blueberry Fruit Development and Ripening
 
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
RNA-Seq analysis of blueberry fruit identifies candidate genes involved in ri...
 
Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020
 
Institute of Learning in Retirement - Miami University (Ohio)
Institute of Learning in Retirement - Miami University (Ohio)Institute of Learning in Retirement - Miami University (Ohio)
Institute of Learning in Retirement - Miami University (Ohio)
 
Overview on arabidopsis and rice genome
Overview on arabidopsis and rice genomeOverview on arabidopsis and rice genome
Overview on arabidopsis and rice genome
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 

Viewers also liked

Viewers also liked (18)

Script-distorted truth
Script-distorted truthScript-distorted truth
Script-distorted truth
 
Acquire Case Studies Digital Promotion
Acquire Case Studies Digital PromotionAcquire Case Studies Digital Promotion
Acquire Case Studies Digital Promotion
 
iglesia de cabanillas del campo
iglesia de cabanillas del campoiglesia de cabanillas del campo
iglesia de cabanillas del campo
 
אירגון בקול מאמר
אירגון בקול מאמראירגון בקול מאמר
אירגון בקול מאמר
 
Acquire Case Studies Increasing Patient Volume
Acquire Case Studies Increasing Patient VolumeAcquire Case Studies Increasing Patient Volume
Acquire Case Studies Increasing Patient Volume
 
Atahualpa and Francisco Pizarro
Atahualpa and Francisco PizarroAtahualpa and Francisco Pizarro
Atahualpa and Francisco Pizarro
 
Passe compose 1
Passe compose 1Passe compose 1
Passe compose 1
 
trabajo
trabajotrabajo
trabajo
 
The Descent - Mise En Scene
The Descent - Mise En SceneThe Descent - Mise En Scene
The Descent - Mise En Scene
 
Valve corporation
Valve corporationValve corporation
Valve corporation
 
Market Study and Feasibility of Amphibious Vehicles in Goa
Market Study and Feasibility of Amphibious Vehicles in GoaMarket Study and Feasibility of Amphibious Vehicles in Goa
Market Study and Feasibility of Amphibious Vehicles in Goa
 
Levels
LevelsLevels
Levels
 
Franch vocabulary(1)
Franch vocabulary(1)Franch vocabulary(1)
Franch vocabulary(1)
 
Skin Test
Skin TestSkin Test
Skin Test
 
Permaculture design d'Antoine Talin : le Jardin des Cairns à Grenoble
Permaculture design d'Antoine Talin : le Jardin des Cairns à GrenoblePermaculture design d'Antoine Talin : le Jardin des Cairns à Grenoble
Permaculture design d'Antoine Talin : le Jardin des Cairns à Grenoble
 
Budget
BudgetBudget
Budget
 
Ciudades inteligentes
Ciudades  inteligentesCiudades  inteligentes
Ciudades inteligentes
 
L'agroécologie pour tous - N° 1 - Sept. 2012
L'agroécologie pour tous - N° 1 - Sept. 2012L'agroécologie pour tous - N° 1 - Sept. 2012
L'agroécologie pour tous - N° 1 - Sept. 2012
 

Similar to PAG-2004-Roe

Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics finalRainu Rajeev
 
Johannes Bergsten Dna Barcoding
Johannes Bergsten Dna BarcodingJohannes Bergsten Dna Barcoding
Johannes Bergsten Dna Barcodingbioinfocourse
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyShaojun Xie
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
Lab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxLab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxkarlos64
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf7006ASWATHIRR
 
introduction to Genomics
introduction to Genomics introduction to Genomics
introduction to Genomics IqraSami3
 
What should Bioinformatics do for EvoDevo?
What should Bioinformatics do for EvoDevo?What should Bioinformatics do for EvoDevo?
What should Bioinformatics do for EvoDevo?ylog
 
126 micro array study for gene expression
126 micro array study for gene expression126 micro array study for gene expression
126 micro array study for gene expressionSHAPE Society
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128GenomeInABottle
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersLarry Smarr
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing priyanka raviraj
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
 

Similar to PAG-2004-Roe (20)

Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics final
 
Johannes Bergsten Dna Barcoding
Johannes Bergsten Dna BarcodingJohannes Bergsten Dna Barcoding
Johannes Bergsten Dna Barcoding
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
 
Gene cloning
Gene cloningGene cloning
Gene cloning
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Lab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptxLab2_3_Lecture_DNA_PCR (3).pptx
Lab2_3_Lecture_DNA_PCR (3).pptx
 
126 micro array study for gene expression
126 micro array study for gene expression126 micro array study for gene expression
126 micro array study for gene expression
 
Marzillier_09052014.pdf
Marzillier_09052014.pdfMarzillier_09052014.pdf
Marzillier_09052014.pdf
 
introduction to Genomics
introduction to Genomics introduction to Genomics
introduction to Genomics
 
What should Bioinformatics do for EvoDevo?
What should Bioinformatics do for EvoDevo?What should Bioinformatics do for EvoDevo?
What should Bioinformatics do for EvoDevo?
 
126 micro array study for gene expression
126 micro array study for gene expression126 micro array study for gene expression
126 micro array study for gene expression
 
Genome project.pdf
Genome project.pdfGenome project.pdf
Genome project.pdf
 
Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128Giab jan2016 intro and update 160128
Giab jan2016 intro and update 160128
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
The Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics ResearchersThe Emerging Global Community of Microbial Metagenomics Researchers
The Emerging Global Community of Microbial Metagenomics Researchers
 
Dn abarcode
Dn abarcodeDn abarcode
Dn abarcode
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 
Mol gen-1
Mol gen-1Mol gen-1
Mol gen-1
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 

PAG-2004-Roe

  • 1. A C G T TheThe MedicagoMedicago truncatulatruncatula genome:genome: a progress reporta progress report Dr. Bruce A. RoeDr. Bruce A. Roe Advanced Center for Genome TechnologyAdvanced Center for Genome Technology Department of Chemistry and BiochemistryDepartment of Chemistry and Biochemistry University of OklahomaUniversity of Oklahoma broe@ou.edu www.genome.ou.edubroe@ou.edu www.genome.ou.edu Plant and Animal GenomePlant and Animal Genome San Deigo January 11San Deigo January 11, 2004, 2004 Photos by Steve Hughes, Genetic Resource Centre (PIRSA-SARDI), Adelaide, Australia. http://www.fao.org/ag/AGP/AGPC/doc/gallery/pictures/meditrunc/meditrunc.htm
  • 2. A C G T • An important forage crop • A genetically tractable model legume • A relatively small (~500 Mbp) diploid genome • Active legume research community • Medicago Research Consortium • Large collection of ESTs • Excellent BAC library • Integrated physical and genetic map • Large number of BAC-end sequences Why sequence the Medicago genome?
  • 3. A C G T DNA GenBank Sequence Pipeline at the University of Oklahoma Genome Center, OU-ACGT DNA shearing (HydroshearTM ) Colony Piking (QPixIITM ) Growing subclones (HiGroTM) Subclone Isolation I (Mini-StaccatoTM ) Subclone isolation II (VPrepTM ) Thermocycling (ABI 9700) Sequencing (ABI 3700) Data assembly and Analysis Primer Synthesis Miscelaneous liquid handling Closure
  • 4. A C G T • This Zymark robot has 384 cannula array, four built in shakers, three attached storage racks, built-in barcoding and a Twister II robotic arm. • This automation has allow us to perform the DNA isolation completely unattended from as many as eighty 384 well plates of bacterial cells per Subclone Isolation (Mini-StaccatoTM ) QuickTime™ and a YUV420 codec decompressor are needed to see this picture.
  • 5. A C G T • Once all three solutions have been added, the plates are transferred from the SciClone workspace deck to a storage rack by the Twister II robotic arm. Subclone Isolation (Mini-StaccatoTM ) QuickTime™ and a YUV420 codec decompressor are needed to see this picture.
  • 6. A C G T • Liquid handling station with 384-channel pipettor head • Four movable shelves on either side of the pipettor head • Used for subclone isolation, sequencing reaction set-up and clean-up. Subclone Isolation and Sequencing Reaction Pipetting (Velocity 11 VPrep) QuickTime™ and a YUV420 codec decompressor are needed to see this picture.
  • 7. A C G T Data assembly and Analysis 32 GB RAM running Solaris 8 OS and 3 TB of data stored on RAID-5 arrays with autoloader tape backup Also: • 12 workstations each with 1 GB RAM Sun V880 server Phred/Phrap/Consed Exgap
  • 8. A C G T Initial WGS Skimming for ~500 Mb Medicago truncatula genome • Collected ~25,000 end-sequences from ~12,500 plasmid-based WGS clones. • Of these ~25,000 sequences, ~1,000 have homology with Medicago truncatula ESTs. • URL: http://www.genome.ou.edu/medicago.html
  • 9. A C G T Phrap assembly of our Medicago truncatula whole genome shotgun survey sequencing data at 0.005-fold genomic sequence coverage
  • 10. A C G T DotPlot of a Phrap assembled whole genome shotgun contig showing multiple repeated regions 0 100 200 300 400 500 600 700 7006005004003002001000 Bases Bases
  • 11. A C G T DotPlot of a Phrap assembled whole genome shotgun contig showing 4 repeated blocks of ~600 bases 0 500 1000 10005000Bases Bases
  • 12. A C G T Yet another genomic contig showing extensive repeated regions Contig 1931 0 200 400 600 6004002000 Bases Bases
  • 13. A C G T >Contig1931 TTTACGTCCCCGTAGTGAACTATTTCCTAAGTTGACTAGTCAATTAGGTG ATAGTTCGTCCGGATGACGTACCGCCGTGAACCCGATATGAGAATTTCAT GTGGTGCATCCTTCTATGTTTGATAAGGTCATTTTGAACGGTCGGATTGA ACGTGGCTGGTGTCGTTCACGATAGAGGCACGTTTAGGTCCCTACGGTGA ACTAGTTCCTAAGTTGACTAGTCAATTAGGTGATAGTTTGTCCGGATGAC GTACCTCCGTGAACCCGATCTGAGAAATTCAAGTTTCTGCATCCTTCTAT GTTTGATAAGGTCATTTTGAACGGTCGGATTGAAGGTGGCTGGTGTTCTT CACATTCTAGGCACGTTTAGGTTCCCGCGGTGAACTAGTTCCTAAGTTGA CTAGTCAATTAGGTGATAGTTCGTCCGGATGACCTACCTCCGTGAACCCG ATATTAGAAATTCAAGTTTCTGCATCCTTCTATGTTTGATAAGGTCATTT TGAACGGTCAGATTGAACGTGGCTGGTGTCGTTCACGATCTAGGCACGTT TAGGTCCCCGCAGTGAACTAGTTCCTAAGTTGACTAGTCAATTAGGTGAT AGTTTGTCCGGATGACGTGACTCCGTAAAGCCAGTATGAGAACTTCTAGT TTCTGCATCCTTTTATGTTTGATAAGGTCATTTTGAACGGTGGGATTGAA CGTTGTTGGTGTCGTTCACGATCTAGGCACGTTTAGGTCCCCGCAGTGAA CTAGTTCCTTAGTTGACTAGTCAATTAGGTGATAGTTCGTCCGGATGACG TATCTCCGTCAGCCCGATCTGAGAAATTCAAATTTCTGCATCCTTCTATG TTTGATAAGGTCATTTTGAACGGTCGGATTGAACGTGGCTGGTGTCGTGC ACGATCAAGGCACGTTTAGGTCCCCGCAGCGAACTAGTTCCTAAGTTGAC TAGTCAATTAGGTGATACCTTGTCCGGATGACGTACCTCCGTGAACCCGA TCTGAGAAATTCAAGTTTCTGCATCCTTCTATGTTTGATAAGGTCATTTT GAACGGTTGGATTGAACATGGCTGGTGTCGTTCACGATCTAGGCACGTTT AGGTCCCCGCAGTGAACTAGTTCCTAAGTTGACTAGTCAATTAGGTGATA GTTCGTCTGGATGACGTACCTCCTTGAACCCAATATGAGAAATTCAATTT TCTTCATCCTTCTATGTTTGATAAGGTCATTTTGAACGGTCGGATTGAAC GTGCCTGGTGTCGTTCACGATCGAGGCACGTTTAGGTCCCCGCAGTGAAC . . .
  • 14. A C G T Summary of our Medicago truncatula WGS Sequencing Assembly with only 0.005-fold Genomic Sequence Coverage • The largest contig (21,157 bp) contained the 26S rRNA genes • 19 smaller contigs (105,455 bp total) were from the chloroplast genome • The remaining ~500 contigs, ranging in size from 2,000 to 12,000 bp contain highly repetitive DNA, which were unique to Medicago, as they had no significant homology in the GenBank database • We concluded that a more directed strategy was needed
  • 15. A C G T Mapped BAC approach in collaboration with Doug Cook and DJ Kim at U.C. Davis with funding from the Noble Foundation, Ardmore, OK
  • 16. A C G T The first ~1000The first ~1000 Medicago truncatulaMedicago truncatula BACsBACs • Initially concentrated on BACs with known biologicalInitially concentrated on BACs with known biological markers and in regions of biological interest that weremarkers and in regions of biological interest that were supplied to us by the UC Davis group.supplied to us by the UC Davis group. • Requests for sequencing specific BACs were directedRequests for sequencing specific BACs were directed to Doug Cook and DJ Kim at UC Davis and theyto Doug Cook and DJ Kim at UC Davis and they supplied us with the BACs once these BACs havesupplied us with the BACs once these BACs have been characterized.been characterized. • Once the BACs were received, we created the shotgunOnce the BACs were received, we created the shotgun libraries, isolated the sequencing templates andlibraries, isolated the sequencing templates and obtained the working draft sequence followed byobtained the working draft sequence followed by closure and finishing.closure and finishing. • All data was made publically available in GenBankAll data was made publically available in GenBank within 24 hours of sequence assembly.within 24 hours of sequence assembly.
  • 19. A C G T The next ~750The next ~750 Medicago truncatulaMedicago truncatula BACsBACs • With recent NSF funding, we will beWith recent NSF funding, we will be sequencing BACs from chromosomessequencing BACs from chromosomes 1,4, 6, and 8 with the goal of completing1,4, 6, and 8 with the goal of completing the sequence of the euchromatic regionsthe sequence of the euchromatic regions of these chromosomes over the next 3of these chromosomes over the next 3 years.years. • Chromosomes 2 and 7 will be sequencedChromosomes 2 and 7 will be sequenced at TIGR, chromosome 3 at The Sangerat TIGR, chromosome 3 at The Sanger Institute and and chromosome 5 atInstitute and and chromosome 5 at Genoscope.Genoscope. • All data will be released immediately asAll data will be released immediately as before.before.
  • 22. A C G T Medicago-specific gene with ESTs but no known homology Gene density of this BAC is ~1 gene per 10 kb
  • 23. A C G T Medicago-specific gene with ESTs but no known homology
  • 27. A C G T Gene Size Distribution (All Sequence Data) (FgenesH vs. Genscan) 0 500 1000 1500 2000 2500 3000 3500 4000 4500 1-1000 1001-2000 2001-3000 3001-4000 4001-5000 5001-6000 6001-7000 7001-8000 8001-9000 9001-10000 10001-11000 11001-12000 12001-13000 13001-14000 14001-15000 15001-16000 16001-17000 17001-18000 18001-19000 19001-20000 20001-above FgeneSH Genscan Number of Genes Gene Size Range 13,396 FgeneSH predicted genes 11,488 Genscan predicted genes
  • 28. A C G T Exon Size Distribution (All Sequence Data) (FgenesH vs. Genscan) 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 1-50 51-100 101-200 201-300 301-400 401-500 501-600 601-700 701-800 801-900 901-1000 1001-1500 1501-2000 2001-2500 2501-3000 3001-3500 3501-4000 Number of Exons Exon Size Range FgeneSH Genscan 59,808 FgeneSH predicted exons 55,792 Genscan predicted exons
  • 29. A C G T Intron Size Distribution (All Sequence Data) (FgenesH vs. Genscan) 0 2000 4000 6000 8000 10000 12000 1-50 51-100 101-200 201-300 301-400 401-500 501-600 601-700 701-800 801-900 901-1000 1001-1500 1501-2000 2001-2500 2501-3000 3001-3500 3501-4000 Number of Introns Intron Size Range FgeneSH Genscan 46,412 FgeneSH predicted introns 44,305 Genscan predicted introns
  • 30. A C G T FgeneSH Genscan Total number of genes 13,397 11,488 Total length of genes 30,793,326 51,687,528 Total exon length 15,794,243 14,400,445 Total number of exons 59,808 55,792 Total intron length 14,999,083 37,287,083 Total number of introns 46,412 44,305 _______________________________________________________ Base Pairs Sequenced 87,423,457 87,423,457 _______________________________________________________ Gene Space (Gene Length/BP Sequenced) 35% 59%_______________________________________________________ Gene Density (Genes/200Mb) 30,649 26,281 1 gene/6.5 kb 1 gene/7.6 kb_______________________________________________________ Arabidopsis 25,498 protein coding genes Gene Density of the ~450 Mb Medicago truncatula genome
  • 31. A C G T Medicago GC Content for ~90 Mb of Genomic BAC Clones Sequenced (mainly from gene rich regions)
  • 32. A C G T Metabolic Overview of Medicago 13,396 FgeneSH predicted genes using the COG Database DNA Metabolism 23% Cellular Processes 23%Metabolism 24% Poorly Characterized 17% No Hits 5% Multiple COG Hits 8%
  • 33. A C G T Metabolic Overview (detailed view) of Medicago 13,396 FgeneSH predicted genes using the COG Database No Hits 5% Translation, ribosomal structure & biogenesis 7% Transcription 5% DNA replication, recombination & repair 11% Multiple COG Hits 8% Poorly Characterized 17% Cell division & chromosome partitioning 2% Posttranslational modification, protein turnover, chaperones 5% Cell envelope biogenesis, outer membrane 4% Cell motility & secretion 3% Inorganic ion transport & metabolism 3% Signal transduction mechanisms 5%Energy production & conversion 5% Carbohydrate transport & metabolism 4% Amino acid transport & metabolism 5% Nucleotide transport & metabolism 2% Coenzyme metabolism 2% Lipid metabolism 2% Secondary metabolites biosynthesis, transport & catabolism 3%
  • 34. A C G T Gene Duplication: Three copies of the phosphoglycerate kinase gene in one BAC
  • 35. A C G T AC138448.fg.10 MATKRSVGTLKEAELKGKRVFVRVDLNVPLDDNLNITDDTRIRAAVPTIKYLTGYGAKVILSSHL----- AC138448.fg.11 MA-KKSVGDLSGAELKGKKVFVRADLNVPLDDNQNITDDTRIRAAIPTIKYLIQNGAKVILSSHL----- AC138448.fg.8 MATKRSVGTLKEGELKGKRVFVRVDLNVPLDDNLNITDDTRIRAAVPTIKYLTGYGAKVILSSHLEIYKT AC138448.fg.10 ------------------------------------------GRPKGVTPKYSLKPLVPRLSELLGTQVK AC138448.fg.11 ------------------------------------------GRPKGVTPKYSLAPLVPRLSELIGIEVI AC138448.fg.8 EVSVSEYNLAVSEYKLAISDTYRYRIRVRHDSSPFLEYRGSQGRPKGVTPKYSLKPLVPRLSELLETQVK AC138448.fg.10 IADDSIGEEVEKLVAQIPEGGVLLLENVRFHKEEEKNDPEFAKKLASLADLYVNDAFGTAHRAHASTEGV AC138448.fg.11 KAEDSIGPEVEKLVASLPDGGVLLLENVRFYKEEEKNDPEHAKKLAALADLYVNDAFGTAHRAHASTEGV AC138448.fg.8 ISDDCIGEEVEKLVAQIPEGGVLLLENVRFHKEEEKNEPEFAKKLASLADLYVNDAFGTAHRAHASTEGV AC138448.fg.10 AKYLKPSVAGFLMQKELDYLVGAVSNPKKPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIFTFYKA AC138448.fg.11 TKYLKPSVAGFLLQKELDYLVGAVSSPKRPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIFTFYKA AC138448.fg.8 AKYLKPSVAGFLMQKELDYLVGAVSNPKKPFAAIVGGSKVSSKIGVIESLLEKVDILLLGGGMIYTFYKA AC138448.fg.10 QGYAVGSSLVEEDKLDLATTLIEKAKAKGVSLLLPTDVVIADKFAADANDKIVPASSIPDGWMGLDIGPD AC138448.fg.11 QGLAVGSSLVEEDKLELATTLIAKAKAKGVSLLLPSDVVIADKFAPDANSQIVPASAIPDGWMGLDIGPD AC138448.fg.8 QGYSIGSSLVEEDKLDLATSLMEKAKAKGVSLLLPTDVVIADKFSADANDKIVPASSIPDGWMGLDIGPD AC138448.fg.10 SIKTFNEALDKSQTIIWNGPMGVFEFDKFAAGTEAIAKKLAEVSGKGVTTIIGGGDSVAAVEKVGLADKM AC138448.fg.11 SIKTFNEALDTTQTIIWNGPMGVFEFDKFAVGTESIAKKLADLSGKGVTTIIGGGDSVAAVEKVGVADVM AC138448.fg.8 SIKTFNEALDKSQTIIWNGPMGVFEFDKFAAGTEAIAKKLAEVSGKGVTTIIGGGDSVAAVEKVGLADKM AC138448.fg.10 SHISTGGGASLELLEGKPLPGVLALDDA* 401 amino acids AC138448.fg.11 SHISTGGGASLELLEGKELPGVLALDEATPVAV* 405 amino acids, differs at 42 positions AC138448.fg.8 SHISTGGGASLELLEGKPLPGVLALDDA* 448 amino acids, differs at 6 positions Gene Duplication: Three copies of phosphoglycerate kinase in one BAC
  • 36. A C G T Printrepeat Analysis of M. truncatula BAC AC121240 vs. A. thaliana Chr.2 Expansion, Duplication, Repeat Elements ~5 kb region ~25 kb region
  • 37. A C G T PIP of M. truncatula BAC AC121240 vs. A. thaliana Chr.2
  • 38. A C G T Medicago truncatulaMedicago truncatula Summary and ConclusionsSummary and Conclusions • Average Predicted Gene Density of 1 gene per 6.5 toAverage Predicted Gene Density of 1 gene per 6.5 to 7.6 Kb by FgeneSH and Genscan, respectively.7.6 Kb by FgeneSH and Genscan, respectively. • Genome characteristics such as %GC, intron/exonGenome characteristics such as %GC, intron/exon size and conserved unique 5’ splice sites revealsize and conserved unique 5’ splice sites reveal Medicago characteristicsMedicago characteristics • The sequence of theThe sequence of the Medicago truncatulaMedicago truncatula genomegenome shows homology to the sequencedshows homology to the sequenced ArabidopsisArabidopsis thalianathaliana genome but expansion, rearrangementsgenome but expansion, rearrangements and duplications are evident.and duplications are evident.
  • 39. A C G T Data Release and Preliminary AnnotationData Release and Preliminary Annotation • All our sequence data is available through links on ourAll our sequence data is available through links on our web site to GenBank and on our ftp site at URL:web site to GenBank and on our ftp site at URL: ftp.genome.ou.edu/medicagoftp.genome.ou.edu/medicago • keyword and blast searches can be done on our web sitekeyword and blast searches can be done on our web site at URL:at URL: http://www.genome.ou.edu/medicago.htmlhttp://www.genome.ou.edu/medicago.html • Additional annotation via Genome Browser databaseAdditional annotation via Genome Browser database are available on our web site at URL:are available on our web site at URL: http://www.genome.ou.edu/medicago_table.htmlhttp://www.genome.ou.edu/medicago_table.html • E-mail suggestions for additional annotation to BruceE-mail suggestions for additional annotation to Bruce Roe at: broe@ou.eduRoe at: broe@ou.edu
  • 40. A C G T Three Year PlanThree Year Plan • Obtain the contiguous sequence of the GeneObtain the contiguous sequence of the Gene Rich regions of four of the 8Rich regions of four of the 8 Medicago truncatulaMedicago truncatula genome at OU, with the remaining four beinggenome at OU, with the remaining four being completed by our international partners at TIGR,completed by our international partners at TIGR, Sanger, and Genoscope.Sanger, and Genoscope. • This information will serve as a solid foundationThis information will serve as a solid foundation for anticipated comparative and functionalfor anticipated comparative and functional legume genomics.legume genomics.
  • 41. A C G T Laboratory OrganizationLaboratory Organization Bruce Roe, PIBruce Roe, PI InformaticsInformatics Support TeamsSupport Teams ProductionProduction AdministrationAdministration Jim WhiteJim White Steve KentonSteve Kenton Hongshing LaiHongshing Lai Sean QianSean Qian Rose Morales-Diaz*Rose Morales-Diaz* Mounir Elharam*Mounir Elharam* Yonas TesfaiYonas Tesfai Steve Shaull**Steve Shaull** Doug WhiteDoug White Work-study Undergraduates**Work-study Undergraduates** Kay Lynn HaleKay Lynn Hale Dixie WishnuckDixie Wishnuck Tami WomackTami Womack Mary Catherine WilliamsMary Catherine Williams DNA SynthesisDNA Synthesis Phoebe Loh*Phoebe Loh* Sulan QiSulan Qi Bart Ford*Bart Ford* Reagents &Reagents & Equip. Maint.Equip. Maint. Mounir Elharam*Mounir Elharam* Doug WhiteDoug White Axin HuaAxin Hua Weihong XuWeihong Xu Jami MilamJami Milam Sara Downard**Sara Downard** Limei YangLimei Yang Angie Prescott*Angie Prescott* Audra Wendt**Audra Wendt** Mandi Aycock**Mandi Aycock** Ziyun YaoZiyun Yao Steve Shaull*Steve Shaull* Youngju YoonYoungju Yoon Trang DoTrang Do Anh DoAnh Do Lily FuLily Fu Yang YeYang Ye James YuJames Yu Tessa Manning**Tessa Manning** Fu YingFu Ying Liping ZhouLiping Zhou Ruihua ShiRuihua Shi Junjie WuJunjie Wu Stephan DeschampsStephan Deschamps Shelly OommenShelly Oommen Christopher LauChristopher Lau Yanhong LiYanhong Li Research TeamsResearch Teams Doris KupferDoris Kupfer Julia Kim*Julia Kim* Sun SoSun So Graham Wiley**Graham Wiley** Lauren Ritterhouse**Lauren Ritterhouse** Lin SongLin Song Ying NiYing Ni Huarong JiangHuarong Jiang ShaoPing LinShaoPing Lin Honggui JiaHonggui Jia Hongming WuHongming Wu Baifang QinBaifang Qin Peng ZhangPeng Zhang Fares NajarFares Najar Chunmei QuChunmei Qu Keqin WangKeqin Wang Carson QuCarson Qu Shuling LiShuling Li Funding from the Noble Foundation, DOE, and NSF Collaborators at Univ. Minnesota, UC Davis, TIGR, Sanger, Genoscope, and the Noble Foundation Pheobe LohPheobe Loh ** Sulan QiSulan Qi Bart Ford*Bart Ford* * Previous undergraduate* Previous undergraduate research studentresearch student ** Present undergraduate** Present undergraduate research studentresearch student
  • 44. A C G T Conserved Intron/Exon Boundry Features by a FELINEs** Analysis of 181,444 Medicago truncatula ESTs in GenBank vs Genomic Sequence Size Range Mean Length Exons 6 - 5,789 nt 268 nt Introns 20 - 3,921 nt 429 nt Intron Conserved Splice Site Sequence Elements Percent Introns w/ 5’ GU 99.21% Introns w/ 5’ GC 0.36%* Introns w/ 5’ AU 0.31% Introns w/ U12 branch sites instead of A12 0.13% *Compared to 0.5 - 2.5% in fungi, and 0.5% in mammals with an EST minimum identity of 90% ** S. Drabensctot, D. Kupfer, J. White, D. Dyer, B. Roe, K. Buchanan and J. Murphy. FELINES: A Utility for Extracting and Examining EST-Defined Introns and Exons. Nucleic Acid Research 31(22), E141 (2003).
  • 45. A C G T Consensus Logogram of the 5’GU vs the 5’AU Class of Introns in Medicago truncatula determined by FELINES AU intron consensus GU intron consensus