MicrobeDB
Centralized storage and access to completed Archaea and Bacteria
genomes
Same as those available at:
http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi
Genome/Flat files are stored in one central location (optional)
E.g. “/var/opt/NCBI_genomes”
Information at the genome project, chromosome, and gene level are parsed
and stored in a MySQL database (MicrobeDB)
including sequences and annotations
The files and the database can be easily updated monthly
MicrobeDB
MicrobeDB MySQL
Schema
October 23, 2006 Brinkman Lab Presentation 3
MicrobeDB Overview
NCBI Download Parse
Annotation Load MicrobeDB
Perl API
MicrobeDB
(MySQL)
Access MicrobeDB
Perl API
Your Perl Script
Results
4
MicrobeDB Versioning
Version
Each monthly download from NCBI is given a new version number
Advantages
Data will not change if you always use the same version number of microbedb
Version date can be cited for any method publications
Disadvantages
Data is redundant in the database (e.g. multiple versions of the same gene)
A version number (version_id) must always be used when retrieving
information otherwise multiple copies will be returned
MicrobeDB Tables
Genomeproject
Contains information about the genome project and the organism that was sequenced
E.g. taxon_id, org_name, lineage, gram_stain, genome_gc, patho_status, disease, genome_size,
pathogenic_in, temp_range, habitat, shape, arrangement, endospore, motility, salinity, etc.
Each genomeproject contains one or more replicons
Replicon
Chromosome or plasmids
E.g. rep_accnum, definition, rep_type, rep_ginum, cds_num, gene_num, protein_num, genome_id, rep_size,
rna_num, rep_seq (complete nucleotide sequence)
Each replicon contains one or more genes
Gene
Contains gene annotations and also the DNA and protein sequences (if protein coding gene)
E.g. gid, pid, protein_accnum, gene_type, gene_start, gene_end, gene_length, gene_strand, gene_name,
locus_tag, gene_product, gene_seq, protein_seq
6
Accessing MicrobeDB
Direct MySQL access
username:perlapi
password:microbedb
OR
Using the MicrobeDB Perl API
MicrobeDB API
Perl Classes/Modules
GenomeProject
Replicon Use these
Classes
Gene
Search
8
MicrobeDB API Example
#Use the MicrobeDB Search library
use lib 'microbedb/MicrobeDB';
use Search;
#Create an object with certain features that we want (i.e. only chromosomes)
my $rep_obj = new Replicon( version_id => '1', rep_type => 'chromosome' );
#Create the search object.
my $search_obj = new Search();
#This does the actual search
my @result_objs = $search_obj->object_search($rep_obj);
#Now we can iterate through each replicon object that was returned
foreach my $curr_rep_obj (@result_objs) {
#get the replicon accesion
my $rep_accnum = $curr_rep_obj->rep_accnum();
#get all genes associated with this chromosome
my $genes = $curr_rep_obj->genes();
#continued next slide
October 23, 2006 Brinkman Lab Presentation 9
MicrobeDB API Example cont.
#print out all genes that are annotated as 16s RNA in FASTA format
foreach $curr_gene (@$genes){
#Only look at rRNA genes
if($curr_gene->gene_type() eq 'rRNA'){
$rna_product = $curr_gene->gene_product();
#check to see if the gene is annotated as a 16s rRNA
if(defined($rna_product) && $curr_gene->gene_product =~ /16s/i){
$gid = $curr_gene->gid();
$start = $curr_gene->gene_start();
$end = $curr_gene->gene_end();
$seq = $curr_gene->gene_seq();
#print out the gene in fasta format
print ">$rep_accnum|gi:$gid|$start - $end|$rna_productn";
print $seq . "n";
10
0 comments
Post a comment