SlideShare a Scribd company logo
Ontology application and use at
the ENCODE DCC
Venkat Malladi
Data Wrangler, ENCODE DCC
Department of Genetics
Stanford University School of Medicine
Venkat Malladi ENCODE DCC
Overview
Venkat Malladi ENCODE DCC
Metadata
Model
Ontologies Search
Future
directions
Intro to
ENCODE and
the DCC
What is ENCODE?
Venkat Malladi ENCODE DCC
Modified from PLoS Biol 9-e1001046,2011
(M. Pazin)
Approximately ~30 different assays
Role of the Data Coordination Center
Venkat Malladi ENCODE DCC
Production labs
Analysis groups
Genome Browser
ENCODE
portal
(DCC)
Data files
Metadata DCC
DCC Integrative
websites
Scientific
community
Role: Data generation Data organization Data access
Tasks: Perform assays Data processing & validation Web-based searches
Perform analyses Data file storage Data
downloads
Validate data Metadata curation
Submit data files
Submit metadata
Challenge: Find common biosamples from
data generated by two consortia
Venkat Malladi ENCODE DCC
356 terms
http://encodeproject.org/ENCODE/cellTypes.html
Projects are internally consistent…..
314 terms
GEO characteristics: common_name, tissue_type, cell_type, lines
Simple text match
Venkat Malladi ENCODE DCC
360 terms
Cell type
… but only 3 biosample names match exactly between projects
314 terms
GEO
IMR90
PBMC
Th17
Metadata annotation using Ontologies
An ontology is a set of words and relationships …
… All relationships must be true.
Venkat Malladi ENCODE DCC
nucleuschromosome
mitochondrial chromosome
mitochondrion
cell
Parent term
Child term
part_of
part_of
part_of
part_of is_a
part_of
X
An ontology is a set of words and relationships.
Need true relationships because inferences can be based
upon them.
Venkat Malladi ENCODE DCC
nucleuschromosome
mitochondrial chromosome
mitochondrion
cell
Parent term
Child term
part_of
part_of
part_of
part_of is_a
part_of
X
part_of
X part_of
http://www.geneontology.org/GO.ontology.relations.shtml
True
False
Why use ontologies?
Venkat Malladi ENCODE DCC
Reason 1: Consistent way of describing biological concepts
Reason 2: Consistency of language facilitates identification
of related data easily.
Reason 3: Consistency in data analysis because
relationships between terms provide flexibility of grouping
while everyone uses the same set of metadata
What metadata is annotated with ontologies?
Venkat Malladi ENCODE DCC
1. the biological sample serving as input (Biosample)
1. the reagents and conditions applied to the biological input
(Treatment)
1. the set of methods and conditions to survey the biological input
(Assay)
Venkat Malladi ENCODE DCC
Biosample ontologies
Venkat Malladi ENCODE DCC
1. Uber anatomy ontology (Uberon) - structure, location and
heterogenous mixture of cells
1. Cell Ontology (CL) - primary cells or stem cells
1. Experimental Factor Ontology (EFO) - no direct corresponding
anatomical structure or physiological cell type
Venkat Malladi ENCODE DCC
Challenge: Find all heart-related tissues?
Venkat Malladi ENCODE DCC
Heart_OC
HCF
HCFaa
HCM
Others?
Fetal Heart
Heart
Right Atrium
Right Ventricle
Others?
Searching ENCODE metadata
Venkat Malladi ENCODE DCC
Ontology driven search
Venkat Malladi ENCODE DCC
Future directions
Venkat Malladi ENCODE DCC
• Additional ontologies
• Ontology- based data validations
Additional ontologies
Venkat Malladi ENCODE DCC
• Protein Ontology
(PRO,http://pir.georgetown.edu/pro/pro.shtml)
o transforming growth factor beta-1 (human)— PR:P01137
• EDAM Ontology (EDAM, http://edamontology.org)
o FASTQ—format:1930, BAM—format:2572
o sequence alignment—data:0863
Ontology based validations
Venkat Malladi ENCODE DCC
Acknowledgments
Venkat Malladi ENCODE DCC
Nikhil Podduturi, Laurence Rowe, Forrest Tanaka
Esther Chan, Jean Davidson, Venkat Malladi, Cricket Sloan, J. Seth
Strattan
Eurie Hong, Mike Cherry (PI), Jim Kent (co-PI), Ben
Hitz
Brian Lee, Stuart Miyasato, Matt Simison, Zhenhua Wang, Marcus Ho
Data Wranglers
Software Engineers
QA, administration,
biocuration
National Institute of General Medical Sciences of the United States AQ1215 National
Institutes of Health (GM10331601); U41 grant from National Human Genome Research
Institute at the U.S. National Institutes of Health (HG006992)

More Related Content

Viewers also liked

Cross-linked metadata standards, repositories and the data policies - The Bio...
Cross-linked metadata standards, repositories and the data policies - The Bio...Cross-linked metadata standards, repositories and the data policies - The Bio...
Cross-linked metadata standards, repositories and the data policies - The Bio...
Peter McQuilton
 
ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC
 
Implementation of GPU-based bioinformatic tools at the ENCODE DCC
Implementation of GPU-based bioinformatic tools at the ENCODE DCCImplementation of GPU-based bioinformatic tools at the ENCODE DCC
Implementation of GPU-based bioinformatic tools at the ENCODE DCC
ENCODE-DCC
 
Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE Portal
ENCODE-DCC
 
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONGI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
ENCODE-DCC
 
Introduction to Git and GitHub
Introduction to Git and GitHubIntroduction to Git and GitHub
Introduction to github slideshare
Introduction to github slideshareIntroduction to github slideshare
Introduction to github slideshare
Rakesh Sukumar
 
Git and GitHub for Documentation
Git and GitHub for DocumentationGit and GitHub for Documentation
Git and GitHub for Documentation
Anne Gentle
 
Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners
HubSpot
 

Viewers also liked (9)

Cross-linked metadata standards, repositories and the data policies - The Bio...
Cross-linked metadata standards, repositories and the data policies - The Bio...Cross-linked metadata standards, repositories and the data policies - The Bio...
Cross-linked metadata standards, repositories and the data policies - The Bio...
 
ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014
 
Implementation of GPU-based bioinformatic tools at the ENCODE DCC
Implementation of GPU-based bioinformatic tools at the ENCODE DCCImplementation of GPU-based bioinformatic tools at the ENCODE DCC
Implementation of GPU-based bioinformatic tools at the ENCODE DCC
 
Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE Portal
 
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONGI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
 
Introduction to Git and GitHub
Introduction to Git and GitHubIntroduction to Git and GitHub
Introduction to Git and GitHub
 
Introduction to github slideshare
Introduction to github slideshareIntroduction to github slideshare
Introduction to github slideshare
 
Git and GitHub for Documentation
Git and GitHub for DocumentationGit and GitHub for Documentation
Git and GitHub for Documentation
 
Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners Git 101: Git and GitHub for Beginners
Git 101: Git and GitHub for Beginners
 

Similar to Ontology application and use at the encode dcc

Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
Benjamin Good
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
David Cook
 
Data base in detail
Data base in detailData base in detail
Data base in detail
Vartika Mishra
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
mikaelhuss
 
DNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesDNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differences
Barbera van Schaik
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
robertstevens65
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
Josef Scheiber
 
Sequencedatabases
SequencedatabasesSequencedatabases
SequencedatabasesAbhik Seal
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the Singularity
Mark Wilkinson
 
Ontology-based data access and semantic mining with Aber-OWL
Ontology-based data access and semantic mining with Aber-OWLOntology-based data access and semantic mining with Aber-OWL
Ontology-based data access and semantic mining with Aber-OWL
Robert Hoehndorf
 
Quality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic ModelingQuality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic Modeling
Stuti Nayak
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics final
Rainu Rajeev
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Syed Ahmad Chan Bukhari, PhD
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
GigaScience, BGI Hong Kong
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Remedy Informatics
 
Chibucos annot go_final
Chibucos annot go_finalChibucos annot go_final
Chibucos annot go_final
Sucheta Tripathy
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
Elena Sügis
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Remedy Informatics
 

Similar to Ontology application and use at the encode dcc (20)

Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 
Data base in detail
Data base in detailData base in detail
Data base in detail
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
DNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differencesDNA analysis on your laptop: Spot the differences
DNA analysis on your laptop: Spot the differences
 
The Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in BiologyThe Past, Present and Future of Knowledge in Biology
The Past, Present and Future of Knowledge in Biology
 
Big Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use CasesBig Data in Pharma - Overview and Use Cases
Big Data in Pharma - Overview and Use Cases
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Web Science, SADI, and the Singularity
Web Science, SADI, and the SingularityWeb Science, SADI, and the Singularity
Web Science, SADI, and the Singularity
 
Ontology-based data access and semantic mining with Aber-OWL
Ontology-based data access and semantic mining with Aber-OWLOntology-based data access and semantic mining with Aber-OWL
Ontology-based data access and semantic mining with Aber-OWL
 
Quality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic ModelingQuality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic Modeling
 
Bioinformatics final
Bioinformatics finalBioinformatics final
Bioinformatics final
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
 
Chibucos annot go_final
Chibucos annot go_finalChibucos annot go_final
Chibucos annot go_final
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
Delivering Curated Chemistry to the World via Crowdsourced Deposition and Ann...
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
 

Recently uploaded

erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 

Recently uploaded (20)

erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 

Ontology application and use at the encode dcc

  • 1. Ontology application and use at the ENCODE DCC Venkat Malladi Data Wrangler, ENCODE DCC Department of Genetics Stanford University School of Medicine Venkat Malladi ENCODE DCC
  • 2. Overview Venkat Malladi ENCODE DCC Metadata Model Ontologies Search Future directions Intro to ENCODE and the DCC
  • 3. What is ENCODE? Venkat Malladi ENCODE DCC Modified from PLoS Biol 9-e1001046,2011 (M. Pazin) Approximately ~30 different assays
  • 4. Role of the Data Coordination Center Venkat Malladi ENCODE DCC Production labs Analysis groups Genome Browser ENCODE portal (DCC) Data files Metadata DCC DCC Integrative websites Scientific community Role: Data generation Data organization Data access Tasks: Perform assays Data processing & validation Web-based searches Perform analyses Data file storage Data downloads Validate data Metadata curation Submit data files Submit metadata
  • 5. Challenge: Find common biosamples from data generated by two consortia Venkat Malladi ENCODE DCC 356 terms http://encodeproject.org/ENCODE/cellTypes.html Projects are internally consistent….. 314 terms GEO characteristics: common_name, tissue_type, cell_type, lines
  • 6. Simple text match Venkat Malladi ENCODE DCC 360 terms Cell type … but only 3 biosample names match exactly between projects 314 terms GEO IMR90 PBMC Th17
  • 8. An ontology is a set of words and relationships … … All relationships must be true. Venkat Malladi ENCODE DCC nucleuschromosome mitochondrial chromosome mitochondrion cell Parent term Child term part_of part_of part_of part_of is_a part_of X
  • 9. An ontology is a set of words and relationships. Need true relationships because inferences can be based upon them. Venkat Malladi ENCODE DCC nucleuschromosome mitochondrial chromosome mitochondrion cell Parent term Child term part_of part_of part_of part_of is_a part_of X part_of X part_of http://www.geneontology.org/GO.ontology.relations.shtml True False
  • 10. Why use ontologies? Venkat Malladi ENCODE DCC Reason 1: Consistent way of describing biological concepts Reason 2: Consistency of language facilitates identification of related data easily. Reason 3: Consistency in data analysis because relationships between terms provide flexibility of grouping while everyone uses the same set of metadata
  • 11. What metadata is annotated with ontologies? Venkat Malladi ENCODE DCC 1. the biological sample serving as input (Biosample) 1. the reagents and conditions applied to the biological input (Treatment) 1. the set of methods and conditions to survey the biological input (Assay)
  • 13. Biosample ontologies Venkat Malladi ENCODE DCC 1. Uber anatomy ontology (Uberon) - structure, location and heterogenous mixture of cells 1. Cell Ontology (CL) - primary cells or stem cells 1. Experimental Factor Ontology (EFO) - no direct corresponding anatomical structure or physiological cell type
  • 15. Challenge: Find all heart-related tissues? Venkat Malladi ENCODE DCC Heart_OC HCF HCFaa HCM Others? Fetal Heart Heart Right Atrium Right Ventricle Others?
  • 16. Searching ENCODE metadata Venkat Malladi ENCODE DCC
  • 17. Ontology driven search Venkat Malladi ENCODE DCC
  • 18. Future directions Venkat Malladi ENCODE DCC • Additional ontologies • Ontology- based data validations
  • 19. Additional ontologies Venkat Malladi ENCODE DCC • Protein Ontology (PRO,http://pir.georgetown.edu/pro/pro.shtml) o transforming growth factor beta-1 (human)— PR:P01137 • EDAM Ontology (EDAM, http://edamontology.org) o FASTQ—format:1930, BAM—format:2572 o sequence alignment—data:0863
  • 20. Ontology based validations Venkat Malladi ENCODE DCC
  • 21. Acknowledgments Venkat Malladi ENCODE DCC Nikhil Podduturi, Laurence Rowe, Forrest Tanaka Esther Chan, Jean Davidson, Venkat Malladi, Cricket Sloan, J. Seth Strattan Eurie Hong, Mike Cherry (PI), Jim Kent (co-PI), Ben Hitz Brian Lee, Stuart Miyasato, Matt Simison, Zhenhua Wang, Marcus Ho Data Wranglers Software Engineers QA, administration, biocuration National Institute of General Medical Sciences of the United States AQ1215 National Institutes of Health (GM10331601); U41 grant from National Human Genome Research Institute at the U.S. National Institutes of Health (HG006992)