Semantic Web Approaches to Candidate Gene Identification

•

3 likes•734 views

Simon Twigger

Slides from my presentation at CSHALS 2010, Boston, February 25th 2010.

Education Technology

Semantic Web Approaches
to
Candidate Gene
Identiﬁcation

Simon Twigger, Ph.D.

Rat researchers ask...
Has anyone done any expression
studies using congenic rats?

Rat researchers ask...
Has anyone done any expression
studies using congenic rats?
What tissue is this gene
expressed in?

Rat researchers ask...
Has anyone done any expression
studies using congenic rats?
What tissue is this gene
expressed in?
Are any of these
genes associated
with my phenotype?

Rat researchers ask...
Has anyone done any expression
studies using congenic rats?
What tissue is this gene
What expression data expressed in?
is known for SD (aka Are any of these
SD/NHsd, Harlan genes associated
Sprague Dawley, with my phenotype?
Sprague Dawley) rats?
What rat expression studies have been
done on Mammary Cancer(aka breast
neoplasms/breast cancer/cancer of the

Biological Data Warehouse

Really important piece of data...

NCBO Annotator

http://www.bioontology.org/wiki/index.php/Annotator_Web_service

Parallel Annotation Workﬂow
GEO Records

Create Annotation
Jobs & Queue Up

Q-Out
1..n Annot. Workers

RabbitMQ Index text
at OBA

Parse
Q-In
Results

Results saved to Put results in to
GMiner database queue for save

Addition of new annotations

NCBO Ontology Widgets
http://www.bioontology.org/wiki/index.php/Ontology_Widgets

Linking annotations to data

Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
Alb

Linking annotations to data
Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
+
Alb

Hbb is_expressed_in rat kidney
Tm2d1 is_expressed_in rat kidney

Linking annotations to data
Tm2d1
RGD1306410
Svs4
Hbb
Scgb2a1
+
Alb

Hbb is_expressed_in rat kidney
Tm2d1 is_expressed_in rat kidney

Human (U133, U133v2.), Mouse (430, U74, U95) and Rat
(U34a/b/c, 230, 230v2)
62,000 samples x ca. 25,000 genes/sample = 1.5B data points

Probeset results on GMiner
Probeset 1395269_s_at for Gabrd - gamma-aminobutyric
acid (GABA) A receptor, delta

Probeset results on GMiner
Gabdr

Hs GABDR

RDF Data integration

Probeset
to MA

Triple Store

RDF Data integration

Probeset Rat Genes
to MA & xrefs

Triple Store

RDF Data integration

Probeset Rat Genes Probeset to
to MA & xrefs RGD ID

Triple Store

RDF Data integration

Probeset Rat Genes Probeset to Mouse Anatomy
to MA & xrefs RGD ID Ontology

Triple Store

QTL
Hypertensive

G G G

Pathway

Hypertension

QTL
Hypertensive

G G G

Pathway

G

G

Hypertension

QTL
Hypertensive

G G G

Pathway

G

G
Component
Function
Process

Hypertension

QTL
Hypertensive

G G G

Pathway

G

G Anatomy
(Kidney)
Component
Function
Process

Hypertension

QTL
Hypertensive

G G G

Pathway Str 1 != Str 2
G

G Anatomy
(Kidney)
Component
Function
Process

Hypertension

Ongoing
• Work on improving term recognition
• Additional ontologies - Cell Type, Drugs,
Phenotype, Disease
• RDFizing (what URIs to use?)
• Triple Store implementation
• Integrate Strain and tissue results into RGD

Acknowledgements
• Joey Geiger - Development of GMiner

• Jennifer Smith - Video creation, data curation

• Rajni Nigam - Rat Strain Ontology

• Clement Jonquet - NCBO Annotator tools

• Mark Musen & NIH Roadmap Initiative - Our Funding!

Links
• http://gminer.mcw.edu Web application

• http://github.com/mcwbbc/gminer Gminer Code

• http://github.com/simont/MCW-RDF RDFizer code

Email: simont@mcw.edu
Twitter: @simon_t

Viewers also liked

Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...

VHIR Vall d’Hebron Institut de Recerca

Genome analysis2

Malla Reddy College of Pharmacy

*Watch the video at the end of the presentation Seminar led by Dr. Xavier de la Cruz, ICREA Research Professor. Head of the Translational Bioinformatics in Neuroscience group of VHIR, at VHIR (22nd November 2012). Content: The need to identify the pathological character of mutations may arise in different contexts in biomedical research. However, the methods available to address this problem essentially depend on the number of cases under analysis. When we work with only a few mutations we can use an artisan-like approach, where all information available on protein sequence, structure and function is manually retrieved and studied. However, when we need to characterize many variants, as can be the case in exome projects, faster methods are required to assess their pathogenicity. In my talk I will illustrate the principles underlying these two approaches with examples from the study of Fabry disease mutations, resulting from our collaborative work at the VHIR.

Identification of pathological mutations from the single-gene case to exome p...

Vall d'Hebron Institute of Research (VHIR)

Gene identification and discovery

Amit Ruchi Yadav

Genetic disorder

Chhabi Acharya

Exome sequencing for disease gene identification and patient diagnostics, Gen...

Copenhagenomics

Viewers also liked (6)

Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...

Genome analysis2

Identification of pathological mutations from the single-gene case to exome p...

Gene identification and discovery

Genetic disorder

Exome sequencing for disease gene identification and patient diagnostics, Gen...

Similar to Semantic Web Approaches to Candidate Gene Identification

Heterotrimeric G-proteins

Gulpreet Kaur

GAPDH, a well-known glycolytic enzyme, mediates

Pei-Ju Chin

Francisco Zafra Centro de Biologia Molecular Severo Ochoa. CSIC-UAM.

Fundación Ramón Areces

M Reich - GenomeSpace

Jan Aerts

2.2 analyzing and manipulating dna

Emmanuel Aguon

Gel Electrophoresis Notes

kathy_lambert

Cellular Neuroscience Products

cailynnjohnson

Cell signalling 2

Dr. Khuram Aziz

Optimizing Grape Rootstock Production and Export of inhibitors of X. fastidio...

huyng

Do plants contain typical GPCRs?” How is G-protein signaling operating in plants. G-proteins are universal signal transducers mediating many cellular responses. In animal systems the G-protein signaling cycle is activated by seven transmembrane-spanning G-protein coupled receptors (or GPCRs, popularly known as “serpentine receptors”). Whether typical G protein-coupled receptors (GPCRs) exist in plants or not is a fundamental question. In contrast to the animal system, the existence of these types of receptors in plants still remains controversial. While in animals ligand binding causes a change in receptor conformation that activate a particular G Protein, in plants, such mechanism is unknown. In fact, it is considered that the plants G-Proteins are self-activating. The G Proteins have their respective GPCRs in animal system. A lot of information is already accumulated in animal system and hence the animal GPCRs are considered “canonical.” Thus, from the very beginning, plant G-proteins have been compared with the animal counterparts and studied as an extrapolation of the animal model. This presentation provides an insight into the molecular mechanisms of G Protein activation in plants as well as whether “canonical” GPCRs are present in any plant species or not.

Gpcr in plants

Ananya Sinha

Pyrosequencing slide presentation rev3.

Robert Bruce

2009 09 08 Wiltshire Ipit Seminar Slides

UNC Eshelman School of Pharmacy

VII Jornadas SEQT - hERG

Pedro-Manuel Grima-Poveda

There is an increasing amount of oncogenomic data available in the last years, and more is to come. The main challenges the scientific community is and will be facing are the integration of this data to extract new knowledge and the intuitive visualization of the results obtained in the analysis. Here two complementary but independent tools for the analysis of oncogenomic data are presented: IntOGen and GiTools. IntOGen is a framework that includes public oncogenomic data and integrates it in different ways. Its main purpose is to identify those genes which are consistently altered (up or down-regulated) across many samples in a specific experiment, and combine all experiment from a same cancer type to end up having a p-value for a gene and cancer type. This same principle can then be applied to gene modules, or sets, which consist of groups of genes that share a biological property (module analysis). IntOGen has a web page from where the user can explore the datasets included in the database, from individual genes in all cancer types to different experiments, or gene modules (GO terms, KEGG pathways or user-defined groups of genes) across all the experiments. GiTools is a desktop-based framework developed also by the lab which allows the analysis and visualization of genomic data. It supports different input formats (all plain text) and data can even be imported from BioMart, so everything stored in that database can be used directly in GiTools. Also there is an IntOGen data importer, so users can download matrices or oncomodules at different levels (experiments or combined results) and use them directly. Right now it can perform a limited number of analysis (enrichment analysis, correlations, results combination...) but it is built in a modular fashion and it can be easily expanded to include more matrix-based statistical tests. It allows the flexible exploration of the data and creating figures for papers from there directly, which can be exported in many different formats. Two case studies are presented to illustrate the combined usefulness of these tools, aiming to answer two main questions: “what biological processes are enriched in genes siginificantly up-regulated in cancer?” and “what is the correlation between different tumour types for the pattern of genes up-regulated?”. Also different real applications of these tools are presented, both from published and unpublished research, stressing that they can be used not only in oncogenomics projects, but also in evolution and global gene regulation. In the near future GiTools will be incorporating new analysis, such as GSEA and clustering, and connections with the R statistical framework. IntOGen will soon have a Biomart-compatible interface, which will make the data even more easily available.

IntOGen & Gitools

christian.perez

INTRODUCTION GFR is best index of kidney function in health and disease and accurate values are needed for optimal decision making in clinical settings. Estimated GFR eGFR based on serum creatinine is first line test of kidney function. CG formula is creatinine based equation and widely applied. Tc99m DTPA Diethylene Triamine Penta Acetic acid is the most commonly used radiopharmaceutical for GFR studies. The Gate method has been most common in the routine setting. AIM AND OBJECTIVES To study correlation of serum creatinine based calculation of GFR with measured ratio isotope GFR in healthy individuals and CKD patients. To assess the accuracy of GFR as calculated by CG GFR formulae using serum creatinine against measured RI GFR Tc 99m DTPA . METHODS This study observational study, which is done in department of medicine and department of nuclear medicine at Army Hospital RandR, Delhi Cantt in CKD and healthy individuals. Our study includes a total of 100 subjects with varying renal functions which includes 50 healthy individual and 50 CKD patients. RESULTS In this study it has been observed that in healthy group CG GFR has weak correlation with DTPA GFR r = 0.104 with p 0.471 . Lt Col (Dr.) Rahul Soni | Dr. Jayita Debnath "Correlation of Serum Creatinine Based Calculation of Glomerular Filtration Rate with Measured Radio Isotope Glomerular Filtration Rate in Healthy Individuals and Chronic Kidney Disease Patients" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-5 | Issue-2 , February 2021, URL: https://www.ijtsrd.com/papers/ijtsrd38443.pdf Paper Url: https://www.ijtsrd.com/medicine/other/38443/correlation-of-serum-creatinine-based-calculation-of-glomerular-filtration-rate-with-measured-radio-isotope-glomerular-filtration-rate-in-healthy-individuals-and-chronic-kidney-disease-patients/lt-col-dr-rahul-soni

Correlation of Serum Creatinine Based Calculation of Glomerular Filtration Ra...

ijtsrd

Clinical applications of NGS

Eastern Biotech

G protein signal

Dr. Khuram Aziz

Signal transduction

Dr.M.Prasad Naidu

Perennial Ryegrass (Lolium perenne L.) Improvement Through Cisgenics®

sathish_p

C Amp Detection Methods In Hts

Vincen Pan

Similar to Semantic Web Approaches to Candidate Gene Identification (20)

Heterotrimeric G-proteins

GAPDH, a well-known glycolytic enzyme, mediates

Francisco Zafra Centro de Biologia Molecular Severo Ochoa. CSIC-UAM.

M Reich - GenomeSpace

2.2 analyzing and manipulating dna

Gel Electrophoresis Notes

Cellular Neuroscience Products

Cell signalling 2

Optimizing Grape Rootstock Production and Export of inhibitors of X. fastidio...

Gpcr in plants

Pyrosequencing slide presentation rev3.

2009 09 08 Wiltshire Ipit Seminar Slides

VII Jornadas SEQT - hERG

IntOGen & Gitools

Correlation of Serum Creatinine Based Calculation of Glomerular Filtration Ra...

Clinical applications of NGS

G protein signal

Signal transduction

Perennial Ryegrass (Lolium perenne L.) Improvement Through Cisgenics®

C Amp Detection Methods In Hts

More from Simon Twigger

Converged IT and Data Commons

Simon Twigger

Presented at the Analyze Boulder meet up in August, 2015, this presentation introduces the genome annotation pipeline built by BioTeam (https://bioteam.net) in support of Autism Speaks' MSSNG project. Built entirely on Google infrastructure it allows the MSSNG researchers to go from 10 Billion genome variants down to a more manageable 38 Million variants. These are then annotated with known biological information and integrated into the MSSNG Portal (https://research.mss.ng) for subsequent exploration and analysis.

A Distributed Annotation Pipeline for MSSNG

Simon Twigger

DevOps and Automation for Bioinformaticians

Simon Twigger

NCBO DBP

Simon Twigger

the iPad - an interface for Biologists?

Simon Twigger

Slides & Notes (which can be shown below slides) from a recent presentation I gave outlining some ideas on how we could utilize some of the tools and approaches being developed in bio/clinical informatics to assist in data analysis and integration in crises such as the Haiti earthquake. This is a 'straw man', I can see reasons for and against this approach so I thought I'd throw it out for comment in the hope that others can help me refine it to the point where it could be useful.

Helping Haiti - a semantic web approach to crisis information management

Simon Twigger

Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...

Simon Twigger

Virtual Proteomics Analysis Cluster in the Cloud

Simon Twigger

More from Simon Twigger (8)

Converged IT and Data Commons

A Distributed Annotation Pipeline for MSSNG

DevOps and Automation for Bioinformaticians

NCBO DBP

the iPad - an interface for Biologists?

Helping Haiti - a semantic web approach to crisis information management

Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...

Virtual Proteomics Analysis Cluster in the Cloud

Recently uploaded

Measures of Central Tendency: Mean, Median and Mode

Thiyagu K

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx

MaritesTamaniVerdade

General Principles of Intellectual Property: Concepts of Intellectual Proper...

Poonam Aher Patil

microwave assisted reaction. General introduction

Maksud Ahmed

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes

Celine George

Unit-IV; Professional Sales Representative (PSR).pptx

VishalSingh1417

ICT role in 21st century education and it's challenges.

MaryamAhmad92

Advanced Views - Calendar View in Odoo 17

Celine George

The global implications of DORA and NIS 2 Directive are significant, extending beyond the European Union. Amongst others, the webinar covers: • DORA and its Implications • Nis 2 Directive and its Implications • How to leverage directive and regulation as a marketing tool and competitive advantage • How to use new compliance framework to request additional budget Presenters: Christophe Mazzola - Senior Cyber Governance Consultant Armed with endless Excel files, a meme catalog worthy of the best X'os (formerly twittos), and a risk register to make your favorite risk manager jealous, I swapped my computer scientist cape a few years ago for that of a (cyber) threat hunter with the honorary title of CISO. Ah, and I am also a quadruple senior certified ISO27001/2/5, Pas mal non ? C'est francais. Malcolm Xavier Malcolm Xavier has been working in the Digital Industry for over 18 Years now. He has worked with Global Clients in South Africa, United States and United Kingdom. He has achieved Many Professional Certifications Like CISSP, Google Cloud Practitioner, TOGAF, Azure Cloud, ITIL v3 etc. His core competencies include IT strategy, cybersecurity, IT infrastructure management, data center migration and consolidation, data protection and compliance, risk management and governance, and IS program development and management. Date: April 25, 2024 Tags: Information Security, Digital Operational Resilience Act (DORA) ------------------------------------------------------------------------------- Find out more about ISO training and certification services Training: Digital Operational Resilience Act (DORA) - EN | PECB NIS 2 Directive - EN | PECB Webinars: https://pecb.com/webinars Article: https://pecb.com/article Whitepaper: https://pecb.com/whitepaper ------------------------------------------------------------------------------- For more information about PECB: Website: https://pecb.com/ LinkedIn: https://www.linkedin.com/company/pecb/ Facebook: https://www.facebook.com/PECBInternational/ Slideshare: http://www.slideshare.net/PECBCERTIFICATION

Beyond the EU: DORA and NIS 2 Directive's Global Impact

PECB

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...

Nguyen Thanh Tu Collection

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx

RAM LAL ANAND COLLEGE, DELHI UNIVERSITY.

Introduction to Nonprofit Accounting: The Basics

TechSoup

Nutritional Needs Presentation - HLTH 104

misteraugie

Grant Readiness 101 TechSoup and Remy Consulting

TechSoup

ICT Role in 21st Century Education & its Challenges.pptx

AreebaZafar22

A Transgenic animal is one that carries a foreign gene that has been deliberately inserted into its genome. The foreign gene are inserted into the germ line of the animal, so it can be transmitted to the progeny. Transgenic animals are animals that are genetically altered to have traits that mimic symptoms of specific human pathologies. They provide genetic model of various human disease which are important in understanding disease and development of new target.

Role Of Transgenic Animal In Target Validation-1.pptx

NikitaBankoti2

psychiatric nursing HISTORY COLLECTION .docx

PoojaSen20

Application orientated numerical on hev.ppt

RamjanShidvankar

The basics of sentences session 2pptx copy.pptx

heathfieldcps1

Unit-V; Pricing (Pharma Marketing Management).pptx

VishalSingh1417

Recently uploaded (20)

Measures of Central Tendency: Mean, Median and Mode

2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx

General Principles of Intellectual Property: Concepts of Intellectual Proper...

microwave assisted reaction. General introduction

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes

Unit-IV; Professional Sales Representative (PSR).pptx

ICT role in 21st century education and it's challenges.

Advanced Views - Calendar View in Odoo 17

Beyond the EU: DORA and NIS 2 Directive's Global Impact

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx

Introduction to Nonprofit Accounting: The Basics

Nutritional Needs Presentation - HLTH 104

Grant Readiness 101 TechSoup and Remy Consulting

ICT Role in 21st Century Education & its Challenges.pptx

Role Of Transgenic Animal In Target Validation-1.pptx

psychiatric nursing HISTORY COLLECTION .docx

Application orientated numerical on hev.ppt

The basics of sentences session 2pptx copy.pptx

Unit-V; Pricing (Pharma Marketing Management).pptx

Semantic Web Approaches to Candidate Gene Identification

1. Semantic Web Approaches to Candidate Gene Identiﬁcation Simon Twigger, Ph.D.

2. http://rgd.mcw.edu

3. Meet the client

4. Hypertensive

5. Hypertensive Hypertension

6. QTL Hypertensive Hypertension

7. QTL Hypertensive G G G Hypertension

8. QTL Hypertensive G G G Hypertension

9. QTL Hypertensive G G G Hypertension

10. Rat researchers ask...

11. Rat researchers ask... Has anyone done any expression studies using congenic rats?

12. Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene expressed in?

13. Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene expressed in? Are any of these genes associated with my phenotype?

14. Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene expressed in? Are any of these genes associated with my phenotype? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the

15. Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene What expression data expressed in? is known for SD (aka Are any of these SD/NHsd, Harlan genes associated Sprague Dawley, with my phenotype? Sprague Dawley) rats? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the

16. Rat researchers ask... Has anyone done any expression studies using congenic rats? What tissue is this gene What expression data expressed in? is known for SD (aka Are any of these SD/NHsd, Harlan genes associated Sprague Dawley, with my phenotype? Sprague Dawley) rats? Has this gene been seen in the brain? What rat expression studies have been done on Mammary Cancer(aka breast neoplasms/breast cancer/cancer of the

17. Biological Data Warehouse

18. Biological Data Warehouse Really important piece of data...

19. NCBI GEO db

20. Data hidden in plain sight

21. NCBO Annotator http://www.bioontology.org/wiki/index.php/Annotator_Web_service

22. Parallel Annotation Workﬂow GEO Records Create Annotation Jobs & Queue Up Q-Out 1..n Annot. Workers RabbitMQ Index text at OBA Parse Q-In Results Results saved to Put results in to GMiner database queue for save

23. Ontologies

24. gminer.mcw.edu

25. Curation of Results

26. Curation of Results

27. Curation of Results

28. Addition of new annotations NCBO Ontology Widgets http://www.bioontology.org/wiki/index.php/Ontology_Widgets

29. Addition of new annotations NCBO Ontology Widgets http://www.bioontology.org/wiki/index.php/Ontology_Widgets

30. Addition of new annotations NCBO Ontology Widgets http://www.bioontology.org/wiki/index.php/Ontology_Widgets

31. Addition of new annotations NCBO Ontology Widgets http://www.bioontology.org/wiki/index.php/Ontology_Widgets

32. Browse/Review Results

33. Browse/Review Results

34. Browse/Review Results

35. Linking annotations to data

36. Linking annotations to data

37. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb

38. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 Alb

39. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 + Alb

40. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 + Alb Hbb is_expressed_in rat kidney Tm2d1 is_expressed_in rat kidney

41. Linking annotations to data Tm2d1 RGD1306410 Svs4 Hbb Scgb2a1 + Alb Hbb is_expressed_in rat kidney Tm2d1 is_expressed_in rat kidney Human (U133, U133v2.), Mouse (430, U74, U95) and Rat (U34a/b/c, 230, 230v2) 62,000 samples x ca. 25,000 genes/sample = 1.5B data points

42. Probeset results on GMiner Probeset 1395269_s_at for Gabrd - gamma-aminobutyric acid (GABA) A receptor, delta

43. Probeset results on GMiner Gabdr

44. Probeset results on GMiner Gabdr Hs GABDR

45. RDF Data integration Probeset to MA Triple Store

46. RDF Data integration Probeset Rat Genes to MA & xrefs Triple Store

47. RDF Data integration Probeset Rat Genes Probeset to to MA & xrefs RGD ID Triple Store

48. RDF Data integration Probeset Rat Genes Probeset to Mouse Anatomy to MA & xrefs RGD ID Ontology Triple Store

49. RDF Data integration Probeset Rat Genes Probeset to Mouse Anatomy to MA & xrefs RGD ID Ontology Triple Store

50. QTL Hypertensive G G G Hypertension

51. QTL Hypertensive G G G Hypertension

52. QTL Hypertensive G G G Pathway Hypertension

53. QTL Hypertensive G G G Pathway G G Hypertension

54. QTL Hypertensive G G G Pathway G G Component Function Process Hypertension

55. QTL Hypertensive G G G Pathway G G Component Function Process Hypertension

56. QTL Hypertensive G G G Pathway G G Anatomy (Kidney) Component Function Process Hypertension

57. QTL Hypertensive G G G Pathway Str 1 != Str 2 G G Anatomy (Kidney) Component Function Process Hypertension

58. Ongoing • Work on improving term recognition • Additional ontologies - Cell Type, Drugs, Phenotype, Disease • RDFizing (what URIs to use?) • Triple Store implementation • Integrate Strain and tissue results into RGD

59. Acknowledgements • Joey Geiger - Development of GMiner • Jennifer Smith - Video creation, data curation • Rajni Nigam - Rat Strain Ontology • Clement Jonquet - NCBO Annotator tools • Mark Musen & NIH Roadmap Initiative - Our Funding!

60. Links • http://gminer.mcw.edu Web application • http://github.com/mcwbbc/gminer Gminer Code • http://github.com/simont/MCW-RDF RDFizer code Email: simont@mcw.edu Twitter: @simon_t

Editor's Notes

The Rat Genome Database is one of the main projects we have at MCW. It is the model organism database for the laboratory rat, Rattus norvegicus. We curate, genes, strains, QTL, etc. and make extensive use of ontologies such as GO, pathway, rat strain, disease, phenotype.
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - &#x2018;then a miracle happens&#x2019;
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - &#x2018;then a miracle happens&#x2019;
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - &#x2018;then a miracle happens&#x2019;
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - &#x2018;then a miracle happens&#x2019;
This is a typical use case for rat genomics - how to identify the causes of hypertension in a hypertensive rat? Quite often a QTL is measured indicating a region on the chromosome that is statistically shown to be related to the trait in question - how to go from the genes in that region to the cause of the disease? Not an easy task - &#x2018;then a miracle happens&#x2019;
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions. Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions. Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions. Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions. Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions. Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Rat biologists ask many questions related to gene expression and diseases, these are some examples of typical questions. Many of these questions are in areas covered by ontologies and would benefit from the additional searching flexibility that ontologies provide
Technical problem - lots of data being stored, hard to find it again. Government Warehouse image. Data is archived with good intentions but in doing so is often not easy to find again... If you cant find the data, its not really much use.
NCBI&#x2019;s Gene Expression Omnibus has a lot of relevant data, either as text or raw data.
Can we start to capture some of this informaiton in an informatically-tractable fashion using ontologies and the OBA tools at the National Center for Biomedical Ontology in an annotation pipeline? The red boxes highlight some concepts of interest - rat strains and tissues being used in this experiment. A human can read these and know whats going on but what about a computer?
Driving biological project - use NCBO Annotator web services to mark up the text in the GEO records using ontologies
Take sections of text from GEO records, create annotation jobs, place in queue Workers take the jobs off the queue, index for appropriate ontologies at NCBO Results are placed on Input queue for saving back to the database.
We are currently using two ontologies, the rat strain ontology created at RGD and the Mouse Gross Anatomy Ontology created at the JAX
GEO data is run through the pipeline and loaded into Gminer for curation and analysis
Annotated results can be reviewed and verified, some annotations are missed such as the Sprague Dawley link
Annotated results can be reviewed and verified, some annotations are missed such as the Sprague Dawley link
New annotations can be added using the NCBO ontology widgets
New annotations can be added using the NCBO ontology widgets
New annotations can be added using the NCBO ontology widgets
Put the OBA system on an Amazon AMI so it can be instantiated at will Allows users to run as many of these things as they want? Consider using a Virtual Machine?
Initial results focusing on GEO rat datasets has provided a lot of great information and allowed us to create some handy navigational interfaces to the data, enabling queries that were not possible on any other site. Want to find expression data for the SS rat Kidney - click the terms and the datasets appear.
Initial results focusing on GEO rat datasets has provided a lot of great information and allowed us to create some handy navigational interfaces to the data, enabling queries that were not possible on any other site. Want to find expression data for the SS rat Kidney - click the terms and the datasets appear.
Can we link from the annotations to the samples, down to the raw data in that sample and from there to the genes involved? Affy chips have the detection call, a fairly conservative present/absent call indicating if the probe set was observed in that particular sample.
Can we link from the annotations to the samples, down to the raw data in that sample and from there to the genes involved? Affy chips have the detection call, a fairly conservative present/absent call indicating if the probe set was observed in that particular sample.
Can we link from the annotations to the samples, down to the raw data in that sample and from there to the genes involved? Affy chips have the detection call, a fairly conservative present/absent call indicating if the probe set was observed in that particular sample.
We can then related the probesets to the genes to the ontology annotations to create triple such as this. If we do this for the affy data in GEO for Rat, Mouse and Human we will have somewhere upwards of 1.5B data points to encode.
We can then related the probesets to the genes to the ontology annotations to create triple such as this. If we do this for the affy data in GEO for Rat, Mouse and Human we will have somewhere upwards of 1.5B data points to encode.
For each probe we can look at the samples in which it was tested and see if it was present/absent/marginal and compile this data to get a feel for how often a gene was seen in a particular tissue/organ.
This can be viewed as a chart of tissue distribution. When compared to similar results from GeneCards/Novartis BioGPS the results are quite comparable indicating that this approach has some merit.
Experimenting with exporting this data into RDF and integrating with related data and vocabularies in triple stores such as Sesame, AllegroGraph and Virtuoso. Early days, still climbing the learning curve with this!
Experimenting with exporting this data into RDF and integrating with related data and vocabularies in triple stores such as Sesame, AllegroGraph and Virtuoso. Early days, still climbing the learning curve with this!
Experimenting with exporting this data into RDF and integrating with related data and vocabularies in triple stores such as Sesame, AllegroGraph and Virtuoso. Early days, still climbing the learning curve with this!
Experimenting with exporting this data into RDF and integrating with related data and vocabularies in triple stores such as Sesame, AllegroGraph and Virtuoso. Early days, still climbing the learning curve with this!
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
As we start to create these triples we can bridge the gap from the QTL and its genes to the disease, allowing the scientists to identify or prioritize candidate genes in their QTL regions (or gene lists) and save them (to some degree) from spending a lot of time manually searching databases online.
Acknowledgements

Semantic Web Approaches to Candidate Gene Identification

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (6)

Similar to Semantic Web Approaches to Candidate Gene Identification

Similar to Semantic Web Approaches to Candidate Gene Identification (20)

More from Simon Twigger

More from Simon Twigger (8)

Recently uploaded

Recently uploaded (20)

Semantic Web Approaches to Candidate Gene Identification

Editor's Notes