Querying Chado




Monday, February 16, 2009
Overview

                   • Relational databases
                   • Chado
                   • Writing queries
      ...
Relational database


                   • Data are organised in tables:
                   • The columns of the table rep...
A conventional genomic database
                                          organism




         gene
           gene_id   ...
A conventional genomic database
                                            organism




                            Prima...
A conventional genomic database
                                            organism




                            Prima...
A conventional genomic database
                                            organism




                                 ...
A conventional genomic database
                                            organism
                                     ...
A conventional genomic database
                                          organism
                                       ...
A conventional genomic database
                                          organism
                                       ...
The core of Chado
                              Organism
                              organism_id
                       ...
The core of Chado
                              Organism
                              organism_id
                       ...
Connecting to the database


                  • Make sure you have an account on the database,
                  • Log on...
Connecting to the database

            Welcome to psql 8.2.5, the PostgreSQL interactive terminal.

              • Make ...
Example queries
                     d cv




Monday, February 16, 2009
Example queries
                            d for ‘describe’

                     d cv




Monday, February 16, 2009
Example queries
                     d cv

                     select * from cv;




Monday, February 16, 2009
Example queries
                     d cv

                     select * from cv;
                              * means ‘a...
Example queries
                     d cv

                     select * from cv;
                                        ...
Example queries
                     d cv                     Queries end with
                                           ...
Example queries
                     d cv

                     select * from cv;

                     d cvterm




Monda...
Example queries
                     d cv

                     select * from cv;

                     d cvterm

        ...
Example queries
                     d cv

                     select * from cv;

                     d cvterm the terms...
Example queries
                     select name from cvterm
                     where cv_id = 10;

                     ...
Example queries
                     select name from cvterm
                     where cv_id = 10;

                     ...
Count the genes in MRSA252
               select count(*)
               from feature gene
               where gene.type_...
Editing queries


                   • Now type e (for “edit”),
                   • Change “gene” to “pseudogene”:
      ...
More Chado tables
                   • Locations are stored in the table featureloc.

                            Featurel...
More Chado tables
                   • Locations are stored in the table featureloc.
                                     ...
More Chado tables
                   • Locations are stored in the table featureloc.

                            Featurel...
Location example
               select avg(geneloc.fmax - geneloc.fmin)
               from feature gene
               jo...
Location example
               select avg(geneloc.fmax - geneloc.fmin)
               from feature gene
               jo...
Another location example
               select chromosome.uniquename as chromosome
                    , count(*) as quot;...
Another location example
               select chromosome.uniquename as chromosome
                    , count(*) as quot;...
Transcripts and exons

                            Feature_relationship
                            subject_id
           ...
Annotation
                                              Products
                             Feature_cvterm             ...
Lots more examples




                   • Live and direct!




Monday, February 16, 2009
Upcoming SlideShare
Loading in...5
×

Querying Chado.Key

721

Published on

Slides from a short tutorial on querying Chado dabatases using SQL, given for the Pathogen Genomics group at the Wellcome Trust Sanger Institute on 12th February 2009.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
721
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
13
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide








































  • Querying Chado.Key

    1. 1. Querying Chado Monday, February 16, 2009
    2. 2. Overview • Relational databases • Chado • Writing queries • Saving the results • More examples Monday, February 16, 2009
    3. 3. Relational database • Data are organised in tables: • The columns of the table represent attributes, • The rows represent entities. Monday, February 16, 2009
    4. 4. A conventional genomic database organism gene gene_id organism_id locus_tag 1 1 SAR0001 2 1 SAR0002 3 2 PFA00005 4 2 PFB00010 5 3 Smp_012250 6 3 Smp_152680 Monday, February 16, 2009
    5. 5. A conventional genomic database organism Primary key gene gene_id organism_id locus_tag 1 1 SAR0001 2 1 SAR0002 3 2 PFA00005 4 2 PFB00010 5 3 Smp_012250 6 3 Smp_152680 Monday, February 16, 2009
    6. 6. A conventional genomic database organism Primary key Foreign key gene gene_id organism_id locus_tag 1 1 SAR0001 2 1 SAR0002 3 2 PFA00005 4 2 PFB00010 5 3 Smp_012250 6 3 Smp_152680 Monday, February 16, 2009
    7. 7. A conventional genomic database organism Attribute Primary key Foreign key gene gene_id organism_id locus_tag 1 1 SAR0001 2 1 SAR0002 3 2 PFA00005 4 2 PFB00010 5 3 Smp_012250 6 3 Smp_152680 Monday, February 16, 2009
    8. 8. A conventional genomic database organism organism_id genus species strain 1 Staphylococcus aureus MRSA252 2 Plasmodium falciparum 3D7 3 Schistosoma mansoni Attribute Primary key Foreign key gene gene_id organism_id locus_tag 1 1 SAR0001 2 1 SAR0002 3 2 PFA00005 4 2 PFB00010 5 3 Smp_012250 6 3 Smp_152680 Monday, February 16, 2009
    9. 9. A conventional genomic database organism organism_id genus species strain 1 Staphylococcus aureus MRSA252 2 Plasmodium falciparum 3D7 3 Schistosoma mansoni gene gene_id organism_id locus_tag 1 1 SAR0001 2 1 SAR0002 3 2 PFA00005 4 2 PFB00010 5 3 Smp_012250 6 3 Smp_152680 Monday, February 16, 2009
    10. 10. A conventional genomic database organism organism_id genus species strain 1 Staphylococcus aureus MRSA252 2 Plasmodium falciparum 3D7 3 Schistosoma mansoni transcript gene exon gene_id organism_id locus_tag 1 1 SAR0001 2 1 SAR0002 chromosome 3 2 PFA00005 &c 4 2 PFB00010 5 3 Smp_012250 6 3 Smp_152680 Monday, February 16, 2009
    11. 11. The core of Chado Organism organism_id genus species CV cv_id name Feature feature_id CVterm organism_id cvterm_id type_id cv_id uniquename name name residues Monday, February 16, 2009
    12. 12. The core of Chado Organism organism_id genus species CV cv_id name Feature feature_id CVterm organism_id cvterm_id type_id cv_id uniquename name name residues Monday, February 16, 2009
    13. 13. Connecting to the database • Make sure you have an account on the database, • Log onto pcs4, • Type “chado”, • Enter your database password. Monday, February 16, 2009
    14. 14. Connecting to the database Welcome to psql 8.2.5, the PostgreSQL interactive terminal. • Make sure you have an account on the database, Type: copyright for distribution terms h for help with SQL commands • Log ontohelp with psql commands ? for pcs4, g or terminate with semicolon to execute query Type to quit q “chado”, • malaria_workshop=> • Enter your database password. Monday, February 16, 2009
    15. 15. Example queries d cv Monday, February 16, 2009
    16. 16. Example queries d for ‘describe’ d cv Monday, February 16, 2009
    17. 17. Example queries d cv select * from cv; Monday, February 16, 2009
    18. 18. Example queries d cv select * from cv; * means ‘aquot; columns’ Monday, February 16, 2009
    19. 19. Example queries d cv select * from cv; Name of * means ‘aquot; table columns’ Monday, February 16, 2009
    20. 20. Example queries d cv Queries end with a semicolon select * from cv; Name of * means ‘aquot; table columns’ Monday, February 16, 2009
    21. 21. Example queries d cv select * from cv; d cvterm Monday, February 16, 2009
    22. 22. Example queries d cv select * from cv; d cvterm select name from cvterm where cv_id = 10; Monday, February 16, 2009
    23. 23. Example queries d cv select * from cv; d cvterm the terms like this is pretty baffling. Just seeing If you want to understand the structure of the ontology better,from download OBO-Edit you can cvterm select name 'om oboedit.org, and the sequence ontology where cv_id sequenceontology.org 'om = 10; Monday, February 16, 2009
    24. 24. Example queries select name from cvterm where cv_id = 10; select cvterm.name from cvterm join cv on cv.cv_id = cvterm.cv_id where cv.name = 'sequence'; Monday, February 16, 2009
    25. 25. Example queries select name from cvterm where cv_id = 10; select cvterm.name from cvterm join cv on cv.cv_id = cvterm.cv_id where cv.name = 'sequence'; select species from organism where genus = 'Staphylococcus'; Monday, February 16, 2009
    26. 26. Count the genes in MRSA252 select count(*) from feature gene where gene.type_id in ( select cvterm.cvterm_id from cvterm join cv on cv.cv_id = cvterm.cv_id where cv.name = 'sequence' and cvterm.name = 'gene' ) and gene.organism_id in ( select organism_id from organism where genus = 'Staphylococcus' and species = 'aureus (MRSA252)' ); Monday, February 16, 2009
    27. 27. Editing queries • Now type e (for “edit”), • Change “gene” to “pseudogene”: • The query will run again, and count the pseudogenes. Monday, February 16, 2009
    28. 28. More Chado tables • Locations are stored in the table featureloc. Featureloc featureloc_id refers to the gene feature_id refers to the chromosome srcfeature_id fmin } interbase coordinates fmax 1 (forward) or -1 (reverse) strand locgroup } both 0 for the principal location rank Monday, February 16, 2009
    29. 29. More Chado tables • Locations are stored in the table featureloc. Interbase coordinates Featureloc featureloc_id ACGGTCCATACGGTCCATACGGTCCATCGGTTA refers to the gene feature_id refers to the chromosome 0 1 2 3srcfeature_id 45 etc. fmin } interbase coordinates fmax 13–18(forward) or -1 (reverse) 1 strand locgroup } both 0 for the principal location rank Monday, February 16, 2009
    30. 30. More Chado tables • Locations are stored in the table featureloc. Featureloc featureloc_id refers to the gene feature_id refers to the chromosome srcfeature_id fmin } interbase coordinates fmax 1 (forward) or -1 (reverse) strand locgroup } both 0 for the principal location rank Monday, February 16, 2009
    31. 31. Location example select avg(geneloc.fmax - geneloc.fmin) from feature gene join featureloc geneloc on geneloc.feature_id = gene.feature_id where gene.type_id in ( select cvterm.cvterm_id Find the mean gene length of MRSA252 from cvterm join cv on cv.cv_id on the forward strand. genes = cvterm.cv_id where cv.name = 'sequence' and cvterm.name = 'gene' ) and gene.organism_id in ( select organism_id from organism where genus = 'Staphylococcus' and species = 'aureus (MRSA252)' ) and geneloc.locgroup = 0 and geneloc.rank = 0 and geneloc.strand = 1; Monday, February 16, 2009
    32. 32. Location example select avg(geneloc.fmax - geneloc.fmin) from feature gene join featureloc geneloc on geneloc.feature_id = gene.feature_id where gene.type_id in ( select cvterm.cvterm_id from cvterm join cv on cv.cv_id = cvterm.cv_id where cv.name = 'sequence' and cvterm.name = 'gene' ) and gene.organism_id in ( select organism_id from organism where genus = 'Staphylococcus' and species = 'aureus (MRSA252)' ) and geneloc.locgroup = 0 and geneloc.rank = 0 and geneloc.strand = 1; Monday, February 16, 2009
    33. 33. Another location example select chromosome.uniquename as chromosome , count(*) as quot;number of genesquot; from feature gene join featureloc geneloc on geneloc.feature_id = gene.feature_id join feature chromosome on geneloc.srcfeature_id = chromosome.feature_id where gene.type_id in ( select cvterm.cvterm_id How many genes are on each from cvterm chromosome in Plasmodium falciparum? join cv on cv.cv_id = cvterm.cv_id where cv.name = 'sequence' and cvterm.name = 'gene' ) and gene.organism_id in ( select organism_id from organism where genus = 'Plasmodium' and species = 'falciparum' ) and geneloc.locgroup = 0 and geneloc.rank = 0 group by chromosome.uniquename ; Monday, February 16, 2009
    34. 34. Another location example select chromosome.uniquename as chromosome , count(*) as quot;number of genesquot; from feature gene join featureloc geneloc on geneloc.feature_id = gene.feature_id join feature chromosome on geneloc.srcfeature_id = chromosome.feature_id where gene.type_id in ( select cvterm.cvterm_id from cvterm join cv on cv.cv_id = cvterm.cv_id where cv.name = 'sequence' and cvterm.name = 'gene' ) and gene.organism_id in ( select organism_id from organism where genus = 'Plasmodium' and species = 'falciparum' ) and geneloc.locgroup = 0 and geneloc.rank = 0 group by chromosome.uniquename ; Monday, February 16, 2009
    35. 35. Transcripts and exons Feature_relationship subject_id } feature object_id type_id cvterm • Each exon is related to a transcript, • Each transcript is related to a gene, • Each polypeptide is related to a transcript, • Annotation is attached to the polypeptide. Monday, February 16, 2009
    36. 36. Annotation Products Feature_cvterm Most other things feature_id cvterm_id Featureprop feature_id type_id value Monday, February 16, 2009
    37. 37. Lots more examples • Live and direct! Monday, February 16, 2009
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×