SlideShare a Scribd company logo
1 of 49
A Guided SQL Tour of
Bioinformatics Databases
Yannick Pouliot, PhD
Bioresearch Informationist
lanebioresearch@stanford.edu
Lane Medical Library & Knowledge Management Center
2/28/2007

Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu
Content





Very abbreviated review of the relational principle
Some of the technology required to connect to a
remote database
Walk-through of the database schema for Ensembl




Walk-through of the database schema for
BioWarehouse




Hands-on querying

Hands-on querying

Resources


Details on connecting to a remote database

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

2
So Why Are We Here?

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

3
Bioinformatics Databases: Who
Supports Direct Querying?

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

4
Relational Database Terms


Database: Collection of tables and relationship

between tables


Table
 Collection of records that share a common
fundamental characteristic




E.g., patients and locations can each be stored in their own
table

Record
 Basic unit of information in a relational database


E.g., 1 record per perso

A record is composed of columns (“fields”)
Query
 Set of instructions to a database “engine” to retrieve,
sort and format returning data.






“find me all patients in my database”

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

5
Main Relational Database “Engines”




Filemaker
MS Access
MS SQL Server

 MySQL
 Oracle



Postgress
Sybase

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

6
Structure of Relational DB Tables

Data values
live in rows

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

7
Understanding the Relational Principle: A
Simple Database
“join”








Every patient gets ONE record in the Patients table
Every visit gets ONE record in the Visits table
Rows in different tables can be related one to another
using a shared key (identifier)
There can be multiple visits records for a given patient
There can be multiple tissue records for a given patient
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

return

8
The Relational Principle at Work


Related records can be found using a shared
key


Example: Patients.ID = Visits.PatientID
Table name Primary Key

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

9
SQL Querying…With What?
Query browsers used here:

MySQL Query Browser

WinSQL

Other query browsers exist but are more sophisticated



Often more expensive or more complex
Example: PL/SQL Developer, from Allround Automations

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

10
Example: Network Querying of Ensembl
Database Using MySQL Query Browser


What happens when you use query a remote
database?





DEMO

Of note:
May take some time




Big database, lots of data to return from far away…
Easy to write queries with voluminous output
May have to kill the query…

Setting up ODBC: not discussed here, but cheat sheet instructions are in
handout. Location will also be mailed
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

11
The Database Schema: Your
Roadmap For Querying


The schema describes all tables and all fields




Used to determine how to inter-relate tables to
retrieve the desired data

Very important:



Must understand schema for accurate querying
Wrong understanding = wrong results

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

12
Introducing The SQL Select Statement


Good news: This is the only SQL
statement you need to understand for
querying
SELECT LastName, FirstName
FROM Patients

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

13
Basic Syntax of Select Statement
SELECT field_name
FROM table
[WHERE condition]

[ ] = elective

Example:

Select LastName,FirstName
From Patients
Where Alive = ‘Y’;
Note: case sensitive for all but Oracle
 Query statement are written into a tool such as MS Query or
MySQL Query Browser
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

Handout: p2

14
SELECT – (Some) Details

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

15
Moving On:
Real
Biodatabase

Schemas

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

16
Schemas We’ll Look At…


Remember: Schemas…



describe all tables and all fields
used to determine how to inter-relate tables to
retrieve the desired data

Our schemas today:
 Ensembl
 BioWarehouse
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

17
Ensembl








Produced by Sanger Institute
Collection of genome databases for many different
organisms
Free, open source
Web querying: http://www.ensembl.org/
FAQ: What is Ensembl?
All PubMed references pertaining to Ensembl and written
by the Ensembl group

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

18
Exploring the Ensembl Schema


Ensembl CORE schema documentation




First place to go to answer: “what does this table
store?”
Problem: no graphical representation of overall
schema
Relationships harder to appreciate
 Use Catalog function and go from there…


Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

19
“Fundamental” Tables
Fundamental tables
assembly
assembly_exception
attrib_type
coord_system
dna
dnac
exon
exon_stable_id
exon_transcript
gene
gene_stable_id
karyotype
meta
meta_coord
prediction_exon
prediction_transcript
seq_region
seq_region_attrib
supporting_feature
transcript
transcript_attrib
transcript_stable_id
translation
translation_attrib
translation_stable_id

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

Features and analyses
alt_allele
analysis
analysis_description
density_feature
density_type
dna_align_feature
map
marker
marker_feature
marker_map_location
marker_synonym
misc_attrib
misc_feature
misc_feature_misc_set
misc_set
prediction_transcript
protein_align_feature
protein_feature
qtl
qtl_feature
qtl_synonym
regulatory_factor
regulatory_factor_coding
regulatory_feature
regulatory_feature_object
regulatory_search_region
repeat_consensus
repeat_feature
simple_feature

ID Mapping (Map identifiers between releases)
gene_archive
mapping_session
peptide_archive
stable_id_event

Exernal references (IDs to objects in other dbs)
external_db
external_synonym
go_xref
identity_xref
object_xref
xref

Miscellaneous
interpro

20
Understanding
The Ensembl
Schema Using
The Catalog

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

21
Querying Ensembl
 Ensembl

runs on the MySQL
database engine
We’ll use WinSQL


MySQL Query Browser can also
be used, as well as lots of other
querying tools

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

22
Before Proceeding: A Word of Caution


Easy to write queries that…



Retrieve nonsense
Never complete






Scotty to Captain Kirk: “Where going in circles, and at warp 6
we’re going mighty fast…”

Understanding schema is only way to prevent this

Tips:




Use “count” to determine the number of rows in table
BEFORE returning large datasets
Remember: the more tables are joined, the slower the
query

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

Go to join
23
Demo Queries… To Get You
Started




Query 1: return number of genes stored in
Ensembl Human
Query 2: return number of transcripts
produced by genes stored in Ensembl
Human
 Demonstrates JOINing of tables

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

24
Exercises
Together:
1. the number of genes stored in Ensembl Human
 2. the number of transcripts produced by genes stored in
Ensembl Human
(10 min)


On your own:
3. the types of analyses that Ensembl provides
 4. the number of types of markers
 5. the number of markers per chromosome for all chromosomes
 6. Extra points: the minimum and maximum marker distances for
markers on chromosome 19
(20 min)


Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

25
SELCT Statement: A Refresher
“Modifiers” of
select list:
 DISTINCT
FROM table_list
 COUNT
[WHERE conditions]
 SUM
 MIN
[START WITH] [CONNECT BY]
 MAX
[GROUP BY group_by_list]
Also:
 ORDER BY
[HAVING search_conditions]
 LIKE (used in
[ORDER BY order_list [ASC | DESC] ]
WHERE clause)
SELECT [DISTINCT] select_list

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

26
Example Of A Biologically-Useful
Query: All Markers on Chromosome 1

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

27
Now We’re Talking: Returning
Results into Your Favorite
Tool


SQL query results returned to…


MS Excel


… using Data/Import External Data/New
Database Query




Details: Excel Advanced Report Development
, Zapawa 2005

Spotfire

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

In Lane
catalog

28
Next:









BioWarehouse

Produced by SRI International
Integration of genome, biochem rxns, pathways, etc databases from
many different organisms
Free, open source
Accessing PublicHouse
FAQ
Schema
All PubMed references pertaining to BioWarehouse and written by
the BioWarehouse group
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

29
Conceptual Views of the
BioWarehouse Database

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

30
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

31
Querying BioWarehouse



We’ll query using MySQL Query Browser
Caveats:


Lots of datasets supported by BioWarehouse…


.. but some critical ones are missing from publichouse
due to licensing requirements, e.g.,





Also: Need to request account to query




MetaCyc
UniProt

Anonymous user not supported

Resource: MySQL v5 Reference Manual
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

32
BioWarehouse Demo Queries
…to get you started





Query 1: What are the datasets available in
PublicHouse?
Query 2: How many pathways are there for
the EcoCyc dataset?

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

33
Example Biologically Meaningful Query Of BioWarehouse:
For a Given Pathway, Return Proteins Involved Pathway
and Their Molecular Weight

SELECT D.Name as PathwayName,J.WID AS
ProteinWID, J.Name AS ProteinName,
J.MolecularWeightCalc AS MolecularWeightCalc
FROM Pathway D,PathwayReaction F, Reaction G,
EnzymaticReaction H, Protein J
WHERE D.WID = F.PathwayWID AND
F.ReactionWID = G.WID
AND G.WID = H.ReactionWID and H.ProteinWID =
J.WID
AND D.DataSetWID=19
AND D.Name LIKE "%lipopolysaccharide%"
ORDER BY ProteinName
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

34
Exercises
Together:



1. How many datasets are there in PublicHouse?
2. What is the number of genes in S. aureus
(SAUR158878Cyc)?

(10 min)
On your own:
3. List the coding region start and ends for all genes that
code for proteins in the SAUR158878Cyc dataset
 4. How many biochemical reactions are there in each
pathway (of any type) in the EcoCyc (=E. coli) dataset?
(20 min)


Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

35
In Summary…




Knowing the db schema is essential
SELECT statement all you need to know
Remote databases good for exploring a schema at
low cost




No installation…

But:




Performance can be poor
Restrictions on data set
Better to install locally if “real work” to be performed

Remember: SQL gives you the power to return results
directly into your favorite tool!

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

36
Don’t Forget The
Class Evaluation

Lane Medical Library & Knowledge Management Center
http://lane.stanford.edu
Resources

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

38
Setting-Up for
Internet SQL
Querying
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

39
Setting Up Data Source Names
Steps
1. Make sure you have the requisite
driver (next slide)
2. Create a Data Source Name (Windows
only)
3.
4.

Write your query
Get the results back into Excel!
See Lane videorecorded class Managing
Experiment Data Using Excel and Friends:
Digging Out from Under the Avalanche for lots
more details.
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

40
Step 1: Getting Drivers
Essential for SQL Querying


A driver is a piece of software that lets your
operating system talk to a database


Installed drivers visible in ODBC manager




Each database engine (Oracle, MySQL, etc)
requires its own driver






“data connectivity” tool

Generally must be installed by user

Drivers are needed by Data Source Name
tool and querying programs
Require (simple) installation
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

41
MySQL Driver: Needed to Query
MySQL Databases




Windows: Download MySQL
Connector/ODBC 3.51 here
Must be installed for direct querying using
e.g. Excel


Not necessary if you are using the MySQL Query
Browser

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

42
Oracle Driver: Needed to Query
Oracle Databases


Installing “client” software will also install
driver






Windows: Download 10g Client here
Mac: Download 10g Client here
Free Oracle user account required to
download

Must be installed if you are querying
using MS Query or any other query
browser involving Oracle
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

43
Step 2: Creating a Data Source Name




A Data Source Name (DSN) tells programs
on your PC where and how to query a
database
Populating the fields:





Data Source Name: Unique name of your choice
Description: anything
Server: exactly as given by the database provider
Port number: as specified by database provider


Defaults: MySQL: 3306; Oracle: 1521; MS Access: N/A

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

44
Resources – SQL



eBook: Beginning SQL
eBook: Learning SQL

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

45
Lots More Resources From Lane

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

46
Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

47
How To Get Accounts for Direct
SQL Querying
Direct Querying of Selected Bioinformatics Databases
Database

How?

DB
Engine
MySQL

BioWarehouse

http://biowarehouse.ai.sri.com/
 get account for access to publichouse
(publicly-accessible installation of
BioWarehouse; see
http://biowarehouse.ai.sri.com/PublicHouse
Overview.html

Ensembl

http://www.ensembl.org/info/data/download MySQL
.html

Mouse Genome
Database

Mail mgi-help@informatics.jax.org to ask
for an account

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

Sybase

48
Example Querying with MySQL Query
Browser






Free
MySQL only
Facilitates writing of a SQL query 
Execute
graphical
statement
Query statement
Get it at http://www.mysql.com/products/tools/querybrowser/

Lane Medical Library &
Knowledge Management Center
http://lane.stanford.edu

Table descriptions

49

More Related Content

What's hot (20)

(Expasy)
(Expasy)(Expasy)
(Expasy)
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
Maximum parsimony
Maximum parsimonyMaximum parsimony
Maximum parsimony
 
DNA microarray ppt
DNA microarray pptDNA microarray ppt
DNA microarray ppt
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
Genome mapping
Genome mappingGenome mapping
Genome mapping
 
Functional proteomics, methods and tools
Functional proteomics, methods and toolsFunctional proteomics, methods and tools
Functional proteomics, methods and tools
 
NGS Presentation .pptx
NGS Presentation  .pptxNGS Presentation  .pptx
NGS Presentation .pptx
 
Biological networks
Biological networksBiological networks
Biological networks
 
Protein function prediction
Protein function predictionProtein function prediction
Protein function prediction
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
MySQL and bioinformatics
MySQL and bioinformatics MySQL and bioinformatics
MySQL and bioinformatics
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
BLAST AND FASTA.pptx
BLAST AND FASTA.pptxBLAST AND FASTA.pptx
BLAST AND FASTA.pptx
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
 

Similar to Guided Tour of Bioinformatics Databases

Managing experiment data using Excel and Friends
Managing experiment data using Excel and FriendsManaging experiment data using Excel and Friends
Managing experiment data using Excel and FriendsYannick Pouliot
 
Sql a practical introduction
Sql   a practical introductionSql   a practical introduction
Sql a practical introductionHasan Kata
 
Sql a practical introduction
Sql   a practical introductionSql   a practical introduction
Sql a practical introductionsanjaychauhan689
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdfnedalalazzwy
 
Sql a practical_introduction
Sql a practical_introductionSql a practical_introduction
Sql a practical_introductioninvestnow
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyNeil Swainston
 
Enug2011 innovative use-of_sfx_with_new_interface-final
Enug2011 innovative use-of_sfx_with_new_interface-finalEnug2011 innovative use-of_sfx_with_new_interface-final
Enug2011 innovative use-of_sfx_with_new_interface-finalENUG
 
Biological Database Systems
Biological Database SystemsBiological Database Systems
Biological Database SystemsDenis Shestakov
 

Similar to Guided Tour of Bioinformatics Databases (20)

Managing experiment data using Excel and Friends
Managing experiment data using Excel and FriendsManaging experiment data using Excel and Friends
Managing experiment data using Excel and Friends
 
Chemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleansChemspider Presentation at the ACS Meeting in New orleans
Chemspider Presentation at the ACS Meeting in New orleans
 
Sql a practical introduction
Sql   a practical introductionSql   a practical introduction
Sql a practical introduction
 
Sql a practical introduction
Sql   a practical introductionSql   a practical introduction
Sql a practical introduction
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
ChemSpider – An Online Database and Registration System Linking the Web
ChemSpider – An Online Database and  Registration System Linking the WebChemSpider – An Online Database and  Registration System Linking the Web
ChemSpider – An Online Database and Registration System Linking the Web
 
Sql a practical_introduction
Sql a practical_introductionSql a practical_introduction
Sql a practical_introduction
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
RSC ChemSpider -- Managing and Integrating Chemistry on the Internet to Build...
 
Systematic reviews searching part 2 2019
Systematic reviews searching part 2 2019Systematic reviews searching part 2 2019
Systematic reviews searching part 2 2019
 
Databases_L2.pptx
Databases_L2.pptxDatabases_L2.pptx
Databases_L2.pptx
 
Utilizing ChemSpider As A Platform For Education And Exposure Of Student Data...
Utilizing ChemSpider As A Platform For Education And Exposure Of Student Data...Utilizing ChemSpider As A Platform For Education And Exposure Of Student Data...
Utilizing ChemSpider As A Platform For Education And Exposure Of Student Data...
 
Databases
DatabasesDatabases
Databases
 
Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008Whitney Symposium Lecture June 2008
Whitney Symposium Lecture June 2008
 
Informatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems BiologyInformatics In The Manchester Centre For Integrative Systems Biology
Informatics In The Manchester Centre For Integrative Systems Biology
 
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
 
Enug2011 innovative use-of_sfx_with_new_interface-final
Enug2011 innovative use-of_sfx_with_new_interface-finalEnug2011 innovative use-of_sfx_with_new_interface-final
Enug2011 innovative use-of_sfx_with_new_interface-final
 
Biological Database Systems
Biological Database SystemsBiological Database Systems
Biological Database Systems
 
Pivoting approach-eav-data-dinu-2006
Pivoting approach-eav-data-dinu-2006Pivoting approach-eav-data-dinu-2006
Pivoting approach-eav-data-dinu-2006
 
WWW in biotechnology
WWW in biotechnology WWW in biotechnology
WWW in biotechnology
 

More from Yannick Pouliot

Survey of Spark for Data Pre-Processing and Analytics
Survey of Spark for Data Pre-Processing and AnalyticsSurvey of Spark for Data Pre-Processing and Analytics
Survey of Spark for Data Pre-Processing and AnalyticsYannick Pouliot
 
Systems Immunology -- 2014
Systems Immunology -- 2014Systems Immunology -- 2014
Systems Immunology -- 2014Yannick Pouliot
 
Essential UNIX skills for biologists
Essential UNIX skills for biologistsEssential UNIX skills for biologists
Essential UNIX skills for biologistsYannick Pouliot
 
Ontologically-Aware Automated Gating
Ontologically-Aware Automated GatingOntologically-Aware Automated Gating
Ontologically-Aware Automated GatingYannick Pouliot
 
Why The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best FriendWhy The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best FriendYannick Pouliot
 
There’s No Avoiding It: Programming Skills You’ll Need
There’s No Avoiding It:  Programming Skills You’ll NeedThere’s No Avoiding It:  Programming Skills You’ll Need
There’s No Avoiding It: Programming Skills You’ll NeedYannick Pouliot
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataYannick Pouliot
 
Predicting Adverse Drug Reactions Using PubChem Screening Data
Predicting Adverse Drug Reactions Using PubChem Screening DataPredicting Adverse Drug Reactions Using PubChem Screening Data
Predicting Adverse Drug Reactions Using PubChem Screening DataYannick Pouliot
 
Repositioning Old Drugs For New Indications Using Computational Approaches
Repositioning Old Drugs For New Indications Using Computational ApproachesRepositioning Old Drugs For New Indications Using Computational Approaches
Repositioning Old Drugs For New Indications Using Computational ApproachesYannick Pouliot
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 

More from Yannick Pouliot (10)

Survey of Spark for Data Pre-Processing and Analytics
Survey of Spark for Data Pre-Processing and AnalyticsSurvey of Spark for Data Pre-Processing and Analytics
Survey of Spark for Data Pre-Processing and Analytics
 
Systems Immunology -- 2014
Systems Immunology -- 2014Systems Immunology -- 2014
Systems Immunology -- 2014
 
Essential UNIX skills for biologists
Essential UNIX skills for biologistsEssential UNIX skills for biologists
Essential UNIX skills for biologists
 
Ontologically-Aware Automated Gating
Ontologically-Aware Automated GatingOntologically-Aware Automated Gating
Ontologically-Aware Automated Gating
 
Why The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best FriendWhy The Cloud Is A Computational Biologist's Best Friend
Why The Cloud Is A Computational Biologist's Best Friend
 
There’s No Avoiding It: Programming Skills You’ll Need
There’s No Avoiding It:  Programming Skills You’ll NeedThere’s No Avoiding It:  Programming Skills You’ll Need
There’s No Avoiding It: Programming Skills You’ll Need
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological Data
 
Predicting Adverse Drug Reactions Using PubChem Screening Data
Predicting Adverse Drug Reactions Using PubChem Screening DataPredicting Adverse Drug Reactions Using PubChem Screening Data
Predicting Adverse Drug Reactions Using PubChem Screening Data
 
Repositioning Old Drugs For New Indications Using Computational Approaches
Repositioning Old Drugs For New Indications Using Computational ApproachesRepositioning Old Drugs For New Indications Using Computational Approaches
Repositioning Old Drugs For New Indications Using Computational Approaches
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 

Recently uploaded

Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Miss joya
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxDr.Nusrat Tariq
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Gabriel Guevara MD
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaCall Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaPooja Gupta
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbaisonalikaur4
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...Miss joya
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformKweku Zurek
 
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersBook Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersnarwatsonia7
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...narwatsonia7
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Servicesonalikaur4
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photosnarwatsonia7
 

Recently uploaded (20)

Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
 
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jayanagar Just Call 7001305949 Top Class Call Girl Service Available
 
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptx
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024Asthma Review - GINA guidelines summary 2024
Asthma Review - GINA guidelines summary 2024
 
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
Call Girls ITPL Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service NoidaCall Girls Service Noida Maya 9711199012 Independent Escort Service Noida
Call Girls Service Noida Maya 9711199012 Independent Escort Service Noida
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
 
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service MumbaiVIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
VIP Call Girls Mumbai Arpita 9910780858 Independent Escort Service Mumbai
 
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
College Call Girls Pune Mira 9907093804 Short 1500 Night 6000 Best call girls...
 
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
 
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in green park  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in green park DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
See the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy PlatformSee the 2,456 pharmacies on the National E-Pharmacy Platform
See the 2,456 pharmacies on the National E-Pharmacy Platform
 
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbersBook Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
Book Call Girls in Kasavanahalli - 7001305949 with real photos and phone numbers
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
 
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original PhotosCall Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
Call Girl Service Bidadi - For 7001305949 Cheap & Best with original Photos
 

Guided Tour of Bioinformatics Databases

  • 1. A Guided SQL Tour of Bioinformatics Databases Yannick Pouliot, PhD Bioresearch Informationist lanebioresearch@stanford.edu Lane Medical Library & Knowledge Management Center 2/28/2007 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu
  • 2. Content    Very abbreviated review of the relational principle Some of the technology required to connect to a remote database Walk-through of the database schema for Ensembl   Walk-through of the database schema for BioWarehouse   Hands-on querying Hands-on querying Resources  Details on connecting to a remote database Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 2
  • 3. So Why Are We Here? Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 3
  • 4. Bioinformatics Databases: Who Supports Direct Querying? Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 4
  • 5. Relational Database Terms  Database: Collection of tables and relationship between tables  Table  Collection of records that share a common fundamental characteristic   E.g., patients and locations can each be stored in their own table Record  Basic unit of information in a relational database  E.g., 1 record per perso A record is composed of columns (“fields”) Query  Set of instructions to a database “engine” to retrieve, sort and format returning data.    “find me all patients in my database” Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 5
  • 6. Main Relational Database “Engines”    Filemaker MS Access MS SQL Server  MySQL  Oracle   Postgress Sybase Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 6
  • 7. Structure of Relational DB Tables Data values live in rows Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 7
  • 8. Understanding the Relational Principle: A Simple Database “join”      Every patient gets ONE record in the Patients table Every visit gets ONE record in the Visits table Rows in different tables can be related one to another using a shared key (identifier) There can be multiple visits records for a given patient There can be multiple tissue records for a given patient Lane Medical Library & Knowledge Management Center http://lane.stanford.edu return 8
  • 9. The Relational Principle at Work  Related records can be found using a shared key  Example: Patients.ID = Visits.PatientID Table name Primary Key Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 9
  • 10. SQL Querying…With What? Query browsers used here:  MySQL Query Browser  WinSQL Other query browsers exist but are more sophisticated   Often more expensive or more complex Example: PL/SQL Developer, from Allround Automations Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 10
  • 11. Example: Network Querying of Ensembl Database Using MySQL Query Browser  What happens when you use query a remote database?    DEMO Of note: May take some time    Big database, lots of data to return from far away… Easy to write queries with voluminous output May have to kill the query… Setting up ODBC: not discussed here, but cheat sheet instructions are in handout. Location will also be mailed Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 11
  • 12. The Database Schema: Your Roadmap For Querying  The schema describes all tables and all fields   Used to determine how to inter-relate tables to retrieve the desired data Very important:   Must understand schema for accurate querying Wrong understanding = wrong results Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 12
  • 13. Introducing The SQL Select Statement  Good news: This is the only SQL statement you need to understand for querying SELECT LastName, FirstName FROM Patients Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 13
  • 14. Basic Syntax of Select Statement SELECT field_name FROM table [WHERE condition] [ ] = elective Example: Select LastName,FirstName From Patients Where Alive = ‘Y’; Note: case sensitive for all but Oracle  Query statement are written into a tool such as MS Query or MySQL Query Browser Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Handout: p2 14
  • 15. SELECT – (Some) Details Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 15
  • 16. Moving On: Real Biodatabase Schemas Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 16
  • 17. Schemas We’ll Look At…  Remember: Schemas…   describe all tables and all fields used to determine how to inter-relate tables to retrieve the desired data Our schemas today:  Ensembl  BioWarehouse Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 17
  • 18. Ensembl       Produced by Sanger Institute Collection of genome databases for many different organisms Free, open source Web querying: http://www.ensembl.org/ FAQ: What is Ensembl? All PubMed references pertaining to Ensembl and written by the Ensembl group Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 18
  • 19. Exploring the Ensembl Schema  Ensembl CORE schema documentation   First place to go to answer: “what does this table store?” Problem: no graphical representation of overall schema Relationships harder to appreciate  Use Catalog function and go from there…  Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 19
  • 20. “Fundamental” Tables Fundamental tables assembly assembly_exception attrib_type coord_system dna dnac exon exon_stable_id exon_transcript gene gene_stable_id karyotype meta meta_coord prediction_exon prediction_transcript seq_region seq_region_attrib supporting_feature transcript transcript_attrib transcript_stable_id translation translation_attrib translation_stable_id Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Features and analyses alt_allele analysis analysis_description density_feature density_type dna_align_feature map marker marker_feature marker_map_location marker_synonym misc_attrib misc_feature misc_feature_misc_set misc_set prediction_transcript protein_align_feature protein_feature qtl qtl_feature qtl_synonym regulatory_factor regulatory_factor_coding regulatory_feature regulatory_feature_object regulatory_search_region repeat_consensus repeat_feature simple_feature ID Mapping (Map identifiers between releases) gene_archive mapping_session peptide_archive stable_id_event Exernal references (IDs to objects in other dbs) external_db external_synonym go_xref identity_xref object_xref xref Miscellaneous interpro 20
  • 21. Understanding The Ensembl Schema Using The Catalog Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 21
  • 22. Querying Ensembl  Ensembl runs on the MySQL database engine We’ll use WinSQL  MySQL Query Browser can also be used, as well as lots of other querying tools Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 22
  • 23. Before Proceeding: A Word of Caution  Easy to write queries that…   Retrieve nonsense Never complete    Scotty to Captain Kirk: “Where going in circles, and at warp 6 we’re going mighty fast…” Understanding schema is only way to prevent this Tips:   Use “count” to determine the number of rows in table BEFORE returning large datasets Remember: the more tables are joined, the slower the query Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Go to join 23
  • 24. Demo Queries… To Get You Started   Query 1: return number of genes stored in Ensembl Human Query 2: return number of transcripts produced by genes stored in Ensembl Human  Demonstrates JOINing of tables Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 24
  • 25. Exercises Together: 1. the number of genes stored in Ensembl Human  2. the number of transcripts produced by genes stored in Ensembl Human (10 min)  On your own: 3. the types of analyses that Ensembl provides  4. the number of types of markers  5. the number of markers per chromosome for all chromosomes  6. Extra points: the minimum and maximum marker distances for markers on chromosome 19 (20 min)  Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 25
  • 26. SELCT Statement: A Refresher “Modifiers” of select list:  DISTINCT FROM table_list  COUNT [WHERE conditions]  SUM  MIN [START WITH] [CONNECT BY]  MAX [GROUP BY group_by_list] Also:  ORDER BY [HAVING search_conditions]  LIKE (used in [ORDER BY order_list [ASC | DESC] ] WHERE clause) SELECT [DISTINCT] select_list Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 26
  • 27. Example Of A Biologically-Useful Query: All Markers on Chromosome 1 Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 27
  • 28. Now We’re Talking: Returning Results into Your Favorite Tool  SQL query results returned to…  MS Excel  … using Data/Import External Data/New Database Query   Details: Excel Advanced Report Development , Zapawa 2005 Spotfire Lane Medical Library & Knowledge Management Center http://lane.stanford.edu In Lane catalog 28
  • 29. Next:        BioWarehouse Produced by SRI International Integration of genome, biochem rxns, pathways, etc databases from many different organisms Free, open source Accessing PublicHouse FAQ Schema All PubMed references pertaining to BioWarehouse and written by the BioWarehouse group Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 29
  • 30. Conceptual Views of the BioWarehouse Database Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 30
  • 31. Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 31
  • 32. Querying BioWarehouse   We’ll query using MySQL Query Browser Caveats:  Lots of datasets supported by BioWarehouse…  .. but some critical ones are missing from publichouse due to licensing requirements, e.g.,    Also: Need to request account to query   MetaCyc UniProt Anonymous user not supported Resource: MySQL v5 Reference Manual Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 32
  • 33. BioWarehouse Demo Queries …to get you started   Query 1: What are the datasets available in PublicHouse? Query 2: How many pathways are there for the EcoCyc dataset? Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 33
  • 34. Example Biologically Meaningful Query Of BioWarehouse: For a Given Pathway, Return Proteins Involved Pathway and Their Molecular Weight SELECT D.Name as PathwayName,J.WID AS ProteinWID, J.Name AS ProteinName, J.MolecularWeightCalc AS MolecularWeightCalc FROM Pathway D,PathwayReaction F, Reaction G, EnzymaticReaction H, Protein J WHERE D.WID = F.PathwayWID AND F.ReactionWID = G.WID AND G.WID = H.ReactionWID and H.ProteinWID = J.WID AND D.DataSetWID=19 AND D.Name LIKE "%lipopolysaccharide%" ORDER BY ProteinName Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 34
  • 35. Exercises Together:   1. How many datasets are there in PublicHouse? 2. What is the number of genes in S. aureus (SAUR158878Cyc)? (10 min) On your own: 3. List the coding region start and ends for all genes that code for proteins in the SAUR158878Cyc dataset  4. How many biochemical reactions are there in each pathway (of any type) in the EcoCyc (=E. coli) dataset? (20 min)  Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 35
  • 36. In Summary…    Knowing the db schema is essential SELECT statement all you need to know Remote databases good for exploring a schema at low cost   No installation… But:    Performance can be poor Restrictions on data set Better to install locally if “real work” to be performed Remember: SQL gives you the power to return results directly into your favorite tool! Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 36
  • 37. Don’t Forget The Class Evaluation Lane Medical Library & Knowledge Management Center http://lane.stanford.edu
  • 38. Resources Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 38
  • 39. Setting-Up for Internet SQL Querying Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 39
  • 40. Setting Up Data Source Names Steps 1. Make sure you have the requisite driver (next slide) 2. Create a Data Source Name (Windows only) 3. 4. Write your query Get the results back into Excel! See Lane videorecorded class Managing Experiment Data Using Excel and Friends: Digging Out from Under the Avalanche for lots more details. Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 40
  • 41. Step 1: Getting Drivers Essential for SQL Querying  A driver is a piece of software that lets your operating system talk to a database  Installed drivers visible in ODBC manager   Each database engine (Oracle, MySQL, etc) requires its own driver    “data connectivity” tool Generally must be installed by user Drivers are needed by Data Source Name tool and querying programs Require (simple) installation Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 41
  • 42. MySQL Driver: Needed to Query MySQL Databases   Windows: Download MySQL Connector/ODBC 3.51 here Must be installed for direct querying using e.g. Excel  Not necessary if you are using the MySQL Query Browser Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 42
  • 43. Oracle Driver: Needed to Query Oracle Databases  Installing “client” software will also install driver     Windows: Download 10g Client here Mac: Download 10g Client here Free Oracle user account required to download Must be installed if you are querying using MS Query or any other query browser involving Oracle Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 43
  • 44. Step 2: Creating a Data Source Name   A Data Source Name (DSN) tells programs on your PC where and how to query a database Populating the fields:     Data Source Name: Unique name of your choice Description: anything Server: exactly as given by the database provider Port number: as specified by database provider  Defaults: MySQL: 3306; Oracle: 1521; MS Access: N/A Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 44
  • 45. Resources – SQL   eBook: Beginning SQL eBook: Learning SQL Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 45
  • 46. Lots More Resources From Lane Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 46
  • 47. Lane Medical Library & Knowledge Management Center http://lane.stanford.edu 47
  • 48. How To Get Accounts for Direct SQL Querying Direct Querying of Selected Bioinformatics Databases Database How? DB Engine MySQL BioWarehouse http://biowarehouse.ai.sri.com/  get account for access to publichouse (publicly-accessible installation of BioWarehouse; see http://biowarehouse.ai.sri.com/PublicHouse Overview.html Ensembl http://www.ensembl.org/info/data/download MySQL .html Mouse Genome Database Mail mgi-help@informatics.jax.org to ask for an account Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Sybase 48
  • 49. Example Querying with MySQL Query Browser     Free MySQL only Facilitates writing of a SQL query  Execute graphical statement Query statement Get it at http://www.mysql.com/products/tools/querybrowser/ Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Table descriptions 49

Editor's Notes

  1. select marker.marker_id, marker_map_location.chromosome_name, marker_map_location.position, map.map_name from ((marker marker INNER JOIN marker_map_location marker_map_location ON marker.marker_id = marker_map_location.marker_id) INNER JOIN map map ON marker_map_location.map_id = map.map_id) where (marker_map_location.chromosome_name = '19')
  2. SELECT D.Name as PathwayName,J.WID as ProteinWID, J.Name as ProteinName, J.MolecularWeightCalc as MolecularWeightCalc FROM Pathway D,PathwayReaction F, Reaction G, EnzymaticReaction H, Protein J where D.WID = F.PathwayWID and F.ReactionWID = G.WID and G.WID = H.ReactionWID and H.ProteinWID = J.WID and D.DataSetWID=19 and D.Name like "%lipopolysaccharide%" order by ProteinName