Identifying conserved promoter motifs and transcription factor binding sites in orthologous plant promoter collections - Presentation Transcript
Identifying conserved promoter motifs and transcription factor binding sites in orthologous plant promoter collections Endre Sebestyén, ARI-HAS, Martonvásár 8th International Symposium on “New development in green gene technology” 1-4 September 2009, Szeged, Hungary
Transcription factors and binding sites
TFs bind short, often degenerate DNA sequences, the TFBSs
Variable length upstream sequence as promoter, with TFBSs
The TFBSs are usually conserved, in a nonconserved surrounding sequence
Some well known TFBSs
TATA-box
GC-box
CpG islands
Lots of other, less general TFBSs
Similarly expressed genes, or homologs should contain similar TFBSs in their promoters
Transcription factors and binding sites
Promoter analysis/TFBS search
Wet-lab methods
DNAse footprinting
Electrophoretic mobility shift assay
ChIP-Chip, ChIP-Seq
Bioinformatic methods
Search for experimentally verified sites
Consensus sequences
Matrices
De novo motif discovery
Oligo frequency
Phylogenetic footprinting
Other methods
A 30 0 0 0 20 C 0 25 1 0 2 G 0 1 31 0 3 T 2 6 0 32 7
Experimentally verified sites
TRANSFAC, JASPAR, PLACE, PlantCARE
De-novo motif discovery
Orthologous gene groups
Evolutionary conserved functional sites
Co-regulated genes
In same tissue, body part
Developmental stage
Etc
“Real” promoter structure
No general motifs
No TATA box, GC-box, etc
Lots of false positive TFBSs
With wet lab and most of the in silico methods
Sometimes no apparent common TFBSs between co-regulated genes
Database of Orthologous Promoters
Orthologous promoter sequence collections
Based on a BLAST search with first exons of reference species
Plants (Viridiplantae)
Reference species: Arabidopsis thaliana
Chordates
Reference species: human
500/1000/3000 bp 5’ upstream regions
Conserved sequence regions
Annotations
Xrefs to other databases
Annotated transcription start sites
Database of Orthologous Promoters http://doop.abc.hu
DoOP - Clusters
DoOP - Subsets
Cluster > Subset
Subset: collections of evolutionary monophyletic sequences in a cluster
Plant subsets
Brassicaceae
Arabidopsis thaliana
Brassica species
Eudicotyledons
Grape, Solanum species, papaya, tobacco
Magnoliophyta
Maize, rice
Viridiplantae
DoOP - Subsets
Gene types (Gene Ontology)
Standardized annotation for genes
Biological process
What does it do?
transcription, translation, stress response, etc
Cellular component
Where is it located?
membrane, ribosome, cytosol, etc
Molecular function
How does it work?
dehydrogenase, ATP binding, etc
Gene types (Gene Ontology)
500 bp promoters
Search for significantly enriched terms in annotation
Brassicaceae
Eudicotyledons
Magnoliophyta
Viridiplantae
BP: transcription, translation, protein folding, stress response
0 comments
Post a comment