12. Rat researchers ask...
Has anyone done any expression
studies using congenic rats?
What tissue is this gene
expressed in?
13. Rat researchers ask...
Has anyone done any expression
studies using congenic rats?
What tissue is this gene
expressed in?
Are any of these
genes associated
with my phenotype?
14. Rat researchers ask...
Has anyone done any expression
studies using congenic rats?
What tissue is this gene
expressed in?
Are any of these
genes associated
with my phenotype?
What rat expression studies have been
done on Mammary Cancer(aka breast
neoplasms/breast cancer/cancer of the
15. Rat researchers ask...
Has anyone done any expression
studies using congenic rats?
What tissue is this gene
What expression data expressed in?
is known for SD (aka Are any of these
SD/NHsd, Harlan genes associated
Sprague Dawley, with my phenotype?
Sprague Dawley) rats?
What rat expression studies have been
done on Mammary Cancer(aka breast
neoplasms/breast cancer/cancer of the
16. Rat researchers ask...
Has anyone done any expression
studies using congenic rats?
What tissue is this gene
What expression data expressed in?
is known for SD (aka Are any of these
SD/NHsd, Harlan genes associated
Sprague Dawley, with my phenotype?
Sprague Dawley) rats?
Has this gene been seen in the brain?
What rat expression studies have been
done on Mammary Cancer(aka breast
neoplasms/breast cancer/cancer of the
22. Parallel Annotation Workflow
GEO Records
Create Annotation
Jobs & Queue Up
Q-Out
1..n Annot. Workers
RabbitMQ Index text
at OBA
Parse
Q-In
Results
Results saved to Put results in to
GMiner database queue for save
40. Linking annotations to data
Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
+
Alb
Hbb is_expressed_in rat kidney
Tm2d1 is_expressed_in rat kidney
41. Linking annotations to data
Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
+
Alb
Hbb is_expressed_in rat kidney
Tm2d1 is_expressed_in rat kidney
Human (U133, U133v2.), Mouse (430, U74, U95) and Rat
(U34a/b/c, 230, 230v2)
62,000 samples x ca. 25,000 genes/sample = 1.5B data points
42. Probeset results on GMiner
Probeset 1395269_s_at for Gabrd - gamma-aminobutyric
acid (GABA) A receptor, delta
54. QTL
Hypertensive
G G G
Pathway
G
G
Component
Function
Process
Hypertension
55. QTL
Hypertensive
G G G
Pathway
G
G
Component
Function
Process
Hypertension
56. QTL
Hypertensive
G G G
Pathway
G
G Anatomy
(Kidney)
Component
Function
Process
Hypertension
57. QTL
Hypertensive
G G G
Pathway Str 1 != Str 2
G
G Anatomy
(Kidney)
Component
Function
Process
Hypertension
58. Ongoing
• Work on improving term recognition
• Additional ontologies - Cell Type, Drugs,
Phenotype, Disease
• RDFizing (what URIs to use?)
• Triple Store implementation
• Integrate Strain and tissue results into RGD
59. Acknowledgements
• Joey Geiger - Development of GMiner
• Jennifer Smith - Video creation, data curation
• Rajni Nigam - Rat Strain Ontology
• Clement Jonquet - NCBO Annotator tools
• Mark Musen & NIH Roadmap Initiative - Our Funding!
The Rat Genome Database is one of the main projects we have at MCW. It is the model organism database for the laboratory rat, Rattus norvegicus. We curate, genes, strains, QTL, etc. and make extensive use of ontologies such as GO, pathway, rat strain, disease, phenotype.
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - ‘then a miracle happens’
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - ‘then a miracle happens’
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - ‘then a miracle happens’
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - ‘then a miracle happens’
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - ‘then a miracle happens’
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions.
Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions.
Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions.
Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions.
Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions.
Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions.
Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Technical problem - lots of data being stored, hard to find it again.
Government Warehouse image. Data is archived with good intentions but in doing so is often not easy to find again...
If you cant find the data, its not really much use.
NCBI’s Gene Expression Omnibus has a lot of relevant data, either as text or raw data.
Can we start to capture some of this informaiton in an informatically-tractable fashion using ontologies and the OBA tools at the National Center for Biomedical Ontology in an annotation pipeline? The red boxes highlight some concepts of interest - rat strains and tissues being used in this experiment. A human can read these and know whats going on but what about a computer?
Driving biological project - use NCBO Annotator web services to mark up the text in the GEO records using ontologies
Take sections of text from GEO records, create annotation jobs, place in queue
Workers take the jobs off the queue, index for appropriate ontologies at NCBO
Results are placed on Input queue for saving back to the database.
We are currently using two ontologies, the rat strain ontology created at RGD and the Mouse Gross Anatomy Ontology created at the JAX
GEO data is run through the pipeline and loaded into Gminer for curation and analysis
Annotated results can be reviewed and verified, some annotations are missed such as the Sprague Dawley link
Annotated results can be reviewed and verified, some annotations are missed such as the Sprague Dawley link
New annotations can be added using the NCBO ontology widgets
New annotations can be added using the NCBO ontology widgets
New annotations can be added using the NCBO ontology widgets
Put the OBA system on an Amazon AMI so it can be instantiated at will
Allows users to run as many of these things as they want?
Consider using a Virtual Machine?
Initial results focusing on GEO rat datasets has provided a lot of great information and allowed us to create some handy navigational interfaces to the data, enabling queries that were not possible on any other site. Want to find expression data for the SS rat Kidney - click the terms and the datasets appear.
Initial results focusing on GEO rat datasets has provided a lot of great information and allowed us to create some handy navigational interfaces to the data, enabling queries that were not possible on any other site. Want to find expression data for the SS rat Kidney - click the terms and the datasets appear.
Can we link from the annotations to the samples, down to the raw data in that sample and from there to the genes involved? Affy chips have the detection call, a fairly conservative present/absent call indicating if the probe set was observed in that particular sample.
Can we link from the annotations to the samples, down to the raw data in that sample and from there to the genes involved? Affy chips have the detection call, a fairly conservative present/absent call indicating if the probe set was observed in that particular sample.
Can we link from the annotations to the samples, down to the raw data in that sample and from there to the genes involved? Affy chips have the detection call, a fairly conservative present/absent call indicating if the probe set was observed in that particular sample.
We can then related the probesets to the genes to the ontology annotations to create triple such as this. If we do this for the affy data in GEO for Rat, Mouse and Human we will have somewhere upwards of 1.5B data points to encode.
We can then related the probesets to the genes to the ontology annotations to create triple such as this. If we do this for the affy data in GEO for Rat, Mouse and Human we will have somewhere upwards of 1.5B data points to encode.
For each probe we can look at the samples in which it was tested and see if it was present/absent/marginal and compile this data to get a feel for how often a gene was seen in a particular tissue/organ.
This can be viewed as a chart of tissue distribution. When compared to similar results from GeneCards/Novartis BioGPS the results are quite comparable indicating that this approach has some merit.
Experimenting with exporting this data into RDF and integrating with related data and vocabularies in triple stores such as Sesame, AllegroGraph and Virtuoso. Early days, still climbing the learning curve with this!
Experimenting with exporting this data into RDF and integrating with related data and vocabularies in triple stores such as Sesame, AllegroGraph and Virtuoso. Early days, still climbing the learning curve with this!
Experimenting with exporting this data into RDF and integrating with related data and vocabularies in triple stores such as Sesame, AllegroGraph and Virtuoso. Early days, still climbing the learning curve with this!
Experimenting with exporting this data into RDF and integrating with related data and vocabularies in triple stores such as Sesame, AllegroGraph and Virtuoso. Early days, still climbing the learning curve with this!
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.