SlideShare a Scribd company logo
1 of 36
- David Portnoy
http://LinkedIn.com/in/DavidPortnoy
312.970.9740-
Š Copyright 2012-2014 Datalytx, Inc.
Case study in Linked Data and Semantic Web
for the Human Genome domain
NHGRI’s
“GWAS Catalog” Project
National Human Genome Research Institute
īŽ Project Growth:
About the Project
īŽ Project Name: The National Human
Genome Research Institute (NHGRI)
Catalog of Published Genome-Wide
Association Studies (GWAS) Catalog
īŽ Project Description: Manually curated
collection of published GWAS assaying
at least 100,000 single-nucleotide
polymorphisms (SNPs) and all SNP-trait
associations with P <1 × 10−5.
īŽ In addition to SNP-trait association
data, provides the “Diagram Browser”,
an interactive diagram of these
associations mapped to the SNPs’
chromosomal locations. Stats as of Aug 2014:
īŽ Almost 2,000 GWAS related
publications
īŽ Over 14,000 SNPs
# of studies
# of traits
SNP-trait associations
2005 2014
Website: http://www.genome.gov/gwastudies/
Accessing the data
The GWAS Catalog can be accessed via
īŽ Via the “Diagram Browser”
ī‚¨ Implemented as a dynamic visualization on the human karyotype
ī‚¨ With links to study publication, SNPs in Ensembl and ontology terms in
EFO (Experimental Factor Ontology)
īŽ Via a web query search interface
ī‚¨ Provides tabular data for view or download
ī‚¨ Includes traits and links to study publication
īŽ Via other GWAS-related data portals, such as
ī‚¨ Ensembl
ī‚¨ UCSC Genome Browser
ī‚¨ PheGenI
ī‚¨ GWAS Central
GWAS Components
The project is implemented in 3 main components:
1. Curation / Data loading pipeline
2. Data Publisher
3. Diagram Browser
Curation
SNP
Batch
Loader
PubMed
Tracking
Publisher
Inference
engine
Ontology
Loading
Diagram
Browser
Knowledge Base
Ontology Schema
* The source code is managed under the GOCI (GWAS Ontology and Curation Infrastructure) project
Application Implementation
The following technologies have been used for this project
ī‚§ Java for server-side processing
ī‚§ Spring for MVC framework
ī‚§ Maven for build automation and dependency management
ī‚§ Apache Tomcat for web server
ī‚§ Oracle for relational database
ī‚§ HermiT for OWL reasoner
ī‚§ JavaScript / AJAX for Diagram Browser interactivity
ī‚§ SVG for rendering vector graphics in the Diagram Browser
ī‚§ Apache POI for processing spreadsheets
ī‚§ ColdFusion for generating records for each SNP
* The source code is managed under the GOCI (GWAS Ontology and Curation Infrastructure) project
ONTOLOGY SCHEMAS
Ontology schema needed
Before the project could be implemented, an ontology had to be
designed for its components to operate. Working backwards:
īŽ The Diagram Browser needs to display GWAS related data in
order to answer common GWAS use cases
īŽ The Publisher needs to store data, such that it can be reasoned
over and served up to the Diagram Browser
īŽ The Batch Loader needs to extract GWAS data from
publications in a consistent manner for later retrieval by the
Publisher
GWAS Catalog Ontology
Was created by mapping each trait to one or
more terms in the Experimental Factor
Ontology (EFO)
īŽ At the start, 20% of GWAS traits were
already in EFO
īŽ SKOS was used to extend EFO for GWAS-
specific views
īŽ 500 new terms were added to create
GWAS-EFO-SKOS ontology
Reasons for using EFO
īŽ It’s actively developed
īŽ It’s well suited to cover diversity of GWAS
traits
Metrics
Number of classes 13,850
Number of individuals 370
Number of properties 50
Maximum depth
Maximum # of children
Average # of children
Classes with a single child
Classes with > 25 children
Classes with no definition
15
700
7
500
100
13,500
* Note: GWAS Catalog Ontology and GWAS Diagram OWL have been used interchangeably
GWAS Catalog Ontology (cont.)
īŽ Purpose: Models the relationships between GWAS concepts of
“SNP”, “trait” and “chromosome” to the Diagram
īŽ Location of ontology schemas used:
EFO schema: http://www.ebi.ac.uk/efo
GWAS-Diagram schema: http://www.ebi.ac.uk/efo/gwas-diagram
Class Hierarchy Object property hierarchy Data property hierarchy
GWAS study
chromosome
īƒ  chromosome 1..23,
īƒ  Chromosome X, Y
cytogenetic band
single nucleotide polymorphism
trait association
experimental factor
has_part
located_in
location_of
associated_with
is_about
has_about
part_of
has_name
has_snp_reference_id
has_bp_position
has_length
has_p_value
has_pubmed_id
has_author
has_publication_date
has_gwas_trait_name
* Source: OntologyConstants.java; http://www.ebi.ac.uk/fgpt/gwas/ontology/gwas-diagram.owl
Field definitions for OWL schema definitions
1. SNP reference ID: A single nucleotide polymorpism identifier, as assigned by the Single
Nucleotide Polymorphism Database (dbSNP).
2. Base pair position: The position, in base pairs, of a particular element on a genome
3. Base pair length: The length, in base pairs, of any genomic element.
4. P-value: The probability of obtaining a test statistic at least as extreme as the one that
was actually observed.
5. PubMed ID: The publication ID of a scientific paper, as assigned by the PubMed
database.
6. Author: The primary author of a publication, usually expressed as surname followed by
initial(s).
7. Publication date: A date on which a given entity was published
8. GWAS trait name: An arbitrary text label used to add a text definition of a GWAS trait
name that is does not specificially map. Usually this will be used to annotate instances
of Experimental Factor in order to retain information about a trait that was not defined in
the ontology.
9. Chromosomes: Chromosome 1-23; Chromosomes X & Y
10. Trait association: An association that can be asserted between two entities with a
degree of confidence expressed as a p-value.
11. GWAS Study: A study, described by a scientific publication, that identifies genome wide
associations between single nucleotide polymorphisms and phylogenetic traits or
disorders.
Using SKOS for defining the GWAS Catalog ontology
SKOS (Simple Knowledge Organization System) was used to create the
GWAS Catalog ontology by extending the EFO ontology, because:
īŽ Requires less expertise, effort and cost, since it is less semantically
strict and expressive than OWL
īŽ Can be used where the complexity of inferences is limited
īŽ Is easy to use for extending other vocabularies
Introduction to SKOS
SKOS is an area of work developing specifications and
standards to support the use of knowledge organization
systems (KOS) such as thesauri, classification schemes,
subject heading systems and taxonomies within the
framework of the Semantic Web.
Sample dataset generated by OWL API is broken intoâ€Ļ
Data Property Assertion
Class Assertion
Object Property Assertion
Advantage of ontology for traits
Using a predefined ontology for describing traits
(rather than unstructured lists) allows:
1. More complex, compounded and context-
dependent traits to be described
ī‚¨ e.g. “Type 2 diabetes and gout”;
“Parkinson’s disease (interaction with
caffeine)”
2. Creation of semantically meaningful links
between traits
3. More complex and meaningful queries
Traits
â€ĸ Phenotypes, e.g. hair & eye color
â€ĸ Treatment responses, e.g.
response to antineoplastic agents
â€ĸ Diseases, e.g. type 2 diabetes
â€ĸ Assays, e.g. glcyoslyated
haemoglogin level
â€ĸ Chemical/drug names, e.g. C-
reactive protein
CURATION
The Curation process is partially automated
1. Run automated literature searches to capture eligible studies
2. Enter them into the system for review by curators
3. Triage and assign papers to curator
4. Curators use use a web-based tracking and data entry system which allows multiple
users to search, annotate, verify and publish the Catalog data. There are two levels
of manual curation:
a. First all data are extracted by one curator.
b. Some studies could have more than 1000 significant SNPs. So curators create
spreadsheets of SNPs for batch loading into the DB (using Apachi POI Java API
for Microsoft Documents and a ColdFusion extension).
c. Then data are double-checked for accuracy and consistency by another curator
5. Run the automated pipeline that:
a. Checks multiple data sources for accuracy, completeness and consistency:
PubMed, dbSNP, and NCBI's Gene database
b. Adds genomic annotation such as SNP's base pair and cytogenetic location
Literature
search
ID eligible
studies
Entry into
workflow tool
Triage &
assignment
Manual curation
â€ĸ Data entry
â€ĸ Check accuracy
Automated pipeline
â€ĸ Check against
PubMed, dbSNP,
NCBI
â€ĸ Add annotation
Creation of links to external data sources
Each entry in the GWAS
Catalog has links to
supporting data sources
for convenience
Reference
Source
Sample Link / URI
NCBI’s
dbSNP
http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1333049
Ensembl http://useast.ensembl.org/Homo_sapiens/Variation/Explore?r=9:22125003
-22126003;v=rs1333049;vdb=variation;vf=1004336
PubMed http://www.ncbi.nlm.nih.gov/pubmed?Db=pubmed&DbFrom=snp&Cmd=Li
nk&LinkName=snp_pubmed_cited&LinkReadableName=Pubmed+(SNP+
Cited)&IdsFromResult=1333049
OMIM http://omim.org/entry/611139#0000
* Note that currently these are links for use by people, rather than machine readable
linkages that would allow querying across multiple data sources
Future: Opportunity for automating curation
īŽ Machine learning and natural language processing (NLP) to categorize
into traits defined in the GWAS Catalog ontology
īŽ Assign categorization confidence metrics to assist processing workflow
īŽ Accuracy can be verified by humans based on highlighting and
annotations provided by NLP engine
NLP processing &
confidence assignment
Workflow for human
validation (where needed)
Knowledge Base
PUBLISHER
Data flow for GOCI Publisher
Start with the Oracle relational database created
by the Curation process
Java Publisher app converts from the relational
database into OWL individuals
Knowledge base in format of GWAS Catalog
ontology with 13,000 individuals and 43,000
axioms
OWL API and HermiT reasoner create inferences
from GWAS Catalog ontology
Since it takes > 10 hours to run the reasoner, the
job is run in batch and results are cached in RAM
Results are retrieved by Diagram Browser with
requests to app running on Tomcat server
HermiT +
OWL API
SPARQL
Endpoint
(future)
Knowledge Base
(OWL individuals / triples
cached in RAM)
Relational Database
(Oracle)
Java Publisher job
Knowledge Base
with Inferred Triples
(Cached in RAM)
GWAS
Diagram
Browser
Publisher’s output is to OWL triples
â€Ļbecause this format is preferable to having the Diagram Browser query a
relational database. The benefits are:
īŽ Additional inferences about SNP-trait associations
īŽ More expressive queries
īŽ Ability to detect errors or inconsistencies, as defined by the ontology
Using direct queries Using OWL knowledge base
Data has unstructured catalog of traits and
in a fixed relational schema
Data is structured in semantic triples and
reasoned over using an ontology
Queries can be only on string pattern
matching and must be done one at a time.
It’s not possible to query for related or inferred
traits.
Queries can include inferences and complex
questions
Example queries:
â€ĸ Can search on trait name containing
“diabetes” and get results for both type 1
and type 2 diabetes
â€ĸ Comparison between gastric and
esophageal cancers requires manually
combining results from two distinct
searches
Example queries: *
â€ĸ Find all SNPs that are associated with
cancers located in the upper digestive tract
â€ĸ Find all SNPs located on chromosomes 5,
7, 15 and 21 that are associated with
diseases located in the urinary tract, with a
p-value smaller than 10-8
* Source: Welter, D., Burdett, T., et al. (2012) Ontology-driven visualization of NHGRI GWAS data
HermiT OWL Reasoner
īŽ HermiT is a reasoner for ontologies written using OWL (Web
Ontology Language). It is a ProtÊgÊ plugin.
īŽ HermiT can determine whether the ontology for any given OWL
file is consistent and identify the relationship between classes
īŽ HermiT passes all OWL 2 conformance tests for direct semantics
reasoners
īŽ HermiT can be accessed from Java apps through the OWL API
īŽ OWL API is a Java interface for creating, manipulating and
serializing OWL Ontologies
īŽ It includes parsers and writers for RDF, OWL and Turtle, as well as interface
for working with reasoners
HermiT reasoner is implemented with “forward chaining”
īŽ How it works: Rules are processed by reasoner once in batch
mode to generate and cache inferred triples
īŽ Best when:
īŽ Rules of inference and original data don’t change often
īŽ There’s sufficient disk and RAM to store all the inferred triples
īŽ Benefits: Retrieval queries run faster
īŽ Limitation: When rules or explicit data set changes, it may be
necessary to empty and reload the entire data store and re-run
the reasoner over it again
DIAGRAM BROWSER
What is the Diagram Browser?
It’s a diagram that shows SNP-trait associations mapped to the SNPs’
chromosomal locations of the human karyotype. This project has made
significant improvements to it:
īŽ Originally: The diagram used to be a static document manually created
on a quarterly basis (by a medical illustrator)
īŽ Now: Creation is fully automated with each study added and it is
interactive, so that it can be explored dynamically
Diagram Browser: Interactive functionality
Clicking on SNP-associated trait
category enables selection of
only bands with relevant traits
Zoom in and hover over
chromosomes in order to see
traits by chromosomal location
Clicking on diagram displays all
SNPs for a trait and band
How is the Diagram Browser implemented?
1. The Diagram Browser is a JavaScript app
rendered on the client browser
2. Interaction with the diagram, such as filter,
zoom or click, generates a query
3. The query request is sent via AJAX from
the web client to the Tomcat server
4. The server runs a Java program that
converts this request into an OWL class
expression which is processed by the
reasoner
5. The query result causes a string of SVG
(Scalable Vector Graphics) code to be
generated
6. This code is sent back to the web client via
AJAX
7. The JavaScript app renders the SVG
provided
Web Browser
JavaScript app
Web Server
Knowledge Base
(using GWAS
Catalog ontology)
Generate
AJAX request
Render
SVG code
1
Trigger:
Filter,
zoom,
click
2
3
4
6
5
7
Process
request
Generate
SVG
THE FUTURE
Future scalability
Will run into scalability issues asâ€Ļ
īŽ Size of knowledge base grows
īŽ Tools for querying the knowledge base become more
sophisticated
Current
Implementation
Short term
solution
Long Term
Solution
īŽ Monitor system resources and increase where
there are bottlenecks
īŽ Limit queries to a predefined ranges
īŽ Precompute more inferences, based on query
frequency
īŽ Migrate to a persistent RDF triplestore
(such as Virtuoso) from the knowledge base
īŽ Implement SPARQL endpoint for queries
instead of using OWL class expressions
īŽ Consider backward chaining reasoner if
inferred data set gets too big to cache
Future “backward chaining” option
īŽ How it works: Reasoner is deployed between the GWAS
Diagram or SPARQL endpoint and data store, so that inferred
triples are generated in real time as part of query result set
īŽ Best when:
īŽ Rules of inference and original data change often
īŽ Disk or RAM is insufficient to store all the inferred triples
īŽ Benefits: No need to re-run reasoner when data or rules change
īŽ Limitation: Query response may be slow
SPARQL Example: GWAS Central
īŽ Although the NHGRI project currently doesn’t host a live SPARQL
endpoint, it could be set up to do so
īŽ The GWAS Central project already does this. (It collates data from a
range of sources, including the published literature and collaborating
databases such as the NHGRI GWAS Catalog.)
SPARQL query page for
GWAS Centeral
http://fuseki.gwascentral.org/q
uery.html
SPARQL Example: EBI’s Atlas
īŽ EBI hosts the GWAS Diagram, but doesn’t provide a SPARQL endpoint
associated with that project
īŽ It does however host SPARQL endpoints for multiple other projects,
such as Atlas
SPARQL query page and multiple examples for EBI’s Atlas project
(https://www.ebi.ac.uk/rdf/services/atlas/sparql)
GWAS Central: Towards Federation
īŽ GWAS Central is a comprehensive resource for the comparison
and interrogation of multiple GWAS (genome-wide association
studies) projects
īŽ Allows for storage, mining and display of summary-level
association data
īŽ More comprehensive than other openly available projects with a
similar focus (ie, millions vs. thousands of P-values )
īŽ Provides user tools and interfaces not previously available from a
single resource
īŽ Aggregates other related resources:
ī‚¨ GWAS Catalog
ī‚¨ OADGAR
ī‚¨ SNPedia
īŽ GWAS Central platform is available for adoption by other
institutes, consortia, teams and countries
ī‚¨ Ideally, multiple implementations can be federated to allow searching across
multiple data sets
GWAS Central: Towards Federation (cont.)
Comparison of features for GWAS Central, GWAS Catalog,
OADGAR*, SNPedia
* Open Access Database of Genome-wide Association Results
GWAS Central: Towards Federation (cont.)
SPARQL can be used to express queries across diverse data sources, whether
the data is stored natively as RDF or viewed as RDF via middleware. This
specification defines the syntax and semantics of SPARQL 1.1 Federated
Query extension for executing queries distributed over different SPARQL
endpoints.
The SERVICE keyword extends SPARQL 1.1 to support queries that merge
data distributed across the Web.
Source: http://www.w3.org/TR/sparql11-federated-query/
Setting up GWAS Catalog project to query across data sets
Querying across databases using EFO: Since the
GWAS Catalog is based on EFO, it’s possible for a
query to include other biomedical databases annotated
for EFO: ArrayExpress, Ensembl, BioSamples, Pride,
etc.
Querying across databases using other ontologies:
Even if EFO is not used, cross reference definition
citations allows querying across ontologies. The ID of
an external class is added as an annotation on the
relevant EFO term.
Example: Connective tissue is an EFO term that has been
mapped to terms in other ontologies, such as term
BTO:0000421, the identifier for connective tissue in the
Brenda ontology.
THANKS!

More Related Content

What's hot

The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research ObjectsCarole Goble
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceRaul Palma
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...Syed Ahmad Chan Bukhari, PhD
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataHerbert Van de Sompel
 
Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Toni Hermoso Pulido
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...Alejandra Gonzalez-Beltran
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectStuart Chalk
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...Araport
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Sean Ekins
 

What's hot (20)

ROHub
ROHubROHub
ROHub
 
The Rhetoric of Research Objects
The Rhetoric of Research ObjectsThe Rhetoric of Research Objects
The Rhetoric of Research Objects
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
Beyond the PDF 2, 2013
Beyond the PDF 2, 2013Beyond the PDF 2, 2013
Beyond the PDF 2, 2013
 
BioNLPSADI
BioNLPSADIBioNLPSADI
BioNLPSADI
 
Aspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth ScienceAspects of Reproducibility in Earth Science
Aspects of Reproducibility in Earth Science
 
Canadian health census to lod
Canadian health census to lodCanadian health census to lod
Canadian health census to lod
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage data
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...
 
UKON 2014
UKON 2014UKON 2014
UKON 2014
 
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...From peer-reviewed to peer-reproduced: a role for research objects in scholar...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
 
DisGeNET Tutorial SWAT4LS 2015-12-07
DisGeNET Tutorial SWAT4LS 2015-12-07DisGeNET Tutorial SWAT4LS 2015-12-07
DisGeNET Tutorial SWAT4LS 2015-12-07
 
ACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP ProjectACS 248th Paper 71 ChAMP Project
ACS 248th Paper 71 ChAMP Project
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 
Hosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry dataHosting a compound centric community resource for chemistry data
Hosting a compound centric community resource for chemistry data
 

Viewers also liked

Building a semantic website
Building a semantic websiteBuilding a semantic website
Building a semantic websiteCJ Jenkins
 
Semantic Recommandation Sytems for Research 2.0
Semantic Recommandation Sytems for Research 2.0Semantic Recommandation Sytems for Research 2.0
Semantic Recommandation Sytems for Research 2.0Educational Technology
 
Presentacion Dcai 2010
Presentacion Dcai 2010Presentacion Dcai 2010
Presentacion Dcai 2010Victor Codina
 
Price assessment of discontinuous innovation CucinaBarilla
Price assessment of discontinuous innovation CucinaBarillaPrice assessment of discontinuous innovation CucinaBarilla
Price assessment of discontinuous innovation CucinaBarillaTarget Research
 
Developing A Semantic Web Application - ISWC 2008 tutorial
Developing A Semantic Web Application -  ISWC 2008 tutorialDeveloping A Semantic Web Application -  ISWC 2008 tutorial
Developing A Semantic Web Application - ISWC 2008 tutorialEmanuele Della Valle
 
Genetics chapter 5 part 1
Genetics chapter 5 part 1Genetics chapter 5 part 1
Genetics chapter 5 part 1vanessawhitehawk
 
The DNA of Data Quality and the Data Genome
The DNA of Data Quality and the Data GenomeThe DNA of Data Quality and the Data Genome
The DNA of Data Quality and the Data GenomeJohn Owens
 
Construction Industry Review 8 2014
Construction Industry Review  8 2014Construction Industry Review  8 2014
Construction Industry Review 8 2014Remona Divekar
 
IonGAP - Uni of Westminster 23-10-2015
IonGAP - Uni of Westminster 23-10-2015IonGAP - Uni of Westminster 23-10-2015
IonGAP - Uni of Westminster 23-10-2015Adrian Baez-Ortega
 
Semantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia TaxonomySemantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia TaxonomyPatrick Nicolas
 
Succes Story | Abomics
Succes Story | AbomicsSucces Story | Abomics
Succes Story | AbomicsBusiness Finland
 
Menestystarina | Abomics
Menestystarina | AbomicsMenestystarina | Abomics
Menestystarina | AbomicsBusiness Finland
 
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataIonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataAdrian Baez-Ortega
 
Genomics and bioinformatics
Genomics and bioinformatics Genomics and bioinformatics
Genomics and bioinformatics Senthil Natesan
 
GEC 2017: JF Gauthier
GEC 2017: JF GauthierGEC 2017: JF Gauthier
GEC 2017: JF GauthierMark Marich
 
GEC 2017: Igor Oliveira
GEC 2017: Igor OliveiraGEC 2017: Igor Oliveira
GEC 2017: Igor OliveiraMark Marich
 

Viewers also liked (20)

Building a semantic website
Building a semantic websiteBuilding a semantic website
Building a semantic website
 
Semantic Recommandation Sytems for Research 2.0
Semantic Recommandation Sytems for Research 2.0Semantic Recommandation Sytems for Research 2.0
Semantic Recommandation Sytems for Research 2.0
 
Presentacion Dcai 2010
Presentacion Dcai 2010Presentacion Dcai 2010
Presentacion Dcai 2010
 
Data Mining
Data MiningData Mining
Data Mining
 
Price assessment of discontinuous innovation CucinaBarilla
Price assessment of discontinuous innovation CucinaBarillaPrice assessment of discontinuous innovation CucinaBarilla
Price assessment of discontinuous innovation CucinaBarilla
 
Ascoltere la rete: la sentiment analysis
Ascoltere la rete: la sentiment analysisAscoltere la rete: la sentiment analysis
Ascoltere la rete: la sentiment analysis
 
Developing A Semantic Web Application - ISWC 2008 tutorial
Developing A Semantic Web Application -  ISWC 2008 tutorialDeveloping A Semantic Web Application -  ISWC 2008 tutorial
Developing A Semantic Web Application - ISWC 2008 tutorial
 
Mhc
MhcMhc
Mhc
 
Genetics chapter 5 part 1
Genetics chapter 5 part 1Genetics chapter 5 part 1
Genetics chapter 5 part 1
 
The DNA of Data Quality and the Data Genome
The DNA of Data Quality and the Data GenomeThe DNA of Data Quality and the Data Genome
The DNA of Data Quality and the Data Genome
 
Construction Industry Review 8 2014
Construction Industry Review  8 2014Construction Industry Review  8 2014
Construction Industry Review 8 2014
 
IonGAP - Uni of Westminster 23-10-2015
IonGAP - Uni of Westminster 23-10-2015IonGAP - Uni of Westminster 23-10-2015
IonGAP - Uni of Westminster 23-10-2015
 
Semantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia TaxonomySemantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia Taxonomy
 
Genetics
GeneticsGenetics
Genetics
 
Succes Story | Abomics
Succes Story | AbomicsSucces Story | Abomics
Succes Story | Abomics
 
Menestystarina | Abomics
Menestystarina | AbomicsMenestystarina | Abomics
Menestystarina | Abomics
 
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent DataIonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
IonGAP - an Integrated Genome Assembly Platform for Ion Torrent Data
 
Genomics and bioinformatics
Genomics and bioinformatics Genomics and bioinformatics
Genomics and bioinformatics
 
GEC 2017: JF Gauthier
GEC 2017: JF GauthierGEC 2017: JF Gauthier
GEC 2017: JF Gauthier
 
GEC 2017: Igor Oliveira
GEC 2017: Igor OliveiraGEC 2017: Igor Oliveira
GEC 2017: Igor Oliveira
 

Similar to Case Study in Linked Data and Semantic Web: Human Genome

The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use CasesCarole Goble
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paperDBOnto
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paperDBOnto
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataVassilis Protonotarios
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriMonica Munoz-Torres
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data ModelingVital.AI
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community UpdateCarole Goble
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...ICZN
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledgeBenjamin Good
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationPistoia Alliance
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer ResearchCarole Goble
 
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...Open Science Fair
 
2008 11 13 Hcls Call
2008 11 13 Hcls Call2008 11 13 Hcls Call
2008 11 13 Hcls CallJun Zhao
 

Similar to Case Study in Linked Data and Semantic Web: Human Genome (20)

The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
The Research Object Initiative: Frameworks and Use Cases
The Research Object Initiative:Frameworks and Use CasesThe Research Object Initiative:Frameworks and Use Cases
The Research Object Initiative: Frameworks and Use Cases
 
SemFacet paper
SemFacet paperSemFacet paper
SemFacet paper
 
Sem facet paper
Sem facet paperSem facet paper
Sem facet paper
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm Data
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
 
Karyotype DAS client
Karyotype DAS clientKaryotype DAS client
Karyotype DAS client
 
Research Object Community Update
Research Object Community UpdateResearch Object Community Update
Research Object Community Update
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
BioSD Tutorial 2014 Editition
BioSD Tutorial 2014 EdititionBioSD Tutorial 2014 Editition
BioSD Tutorial 2014 Editition
 
Representing and reasoning with biological knowledge
Representing and reasoning with biological knowledgeRepresenting and reasoning with biological knowledge
Representing and reasoning with biological knowledge
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and Federation
 
FAIRer Research
FAIRer ResearchFAIRer Research
FAIRer Research
 
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
 
2008 11 13 Hcls Call
2008 11 13 Hcls Call2008 11 13 Hcls Call
2008 11 13 Hcls Call
 
COPO kick-off meeting
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
 

More from David Portnoy

DDOD framework infographic
DDOD framework infographicDDOD framework infographic
DDOD framework infographicDavid Portnoy
 
Impact of DDOD on Data Quality - White House 2016
Impact of DDOD on Data Quality -  White House 2016Impact of DDOD on Data Quality -  White House 2016
Impact of DDOD on Data Quality - White House 2016David Portnoy
 
Industry Uses of HHS Data
Industry Uses of HHS DataIndustry Uses of HHS Data
Industry Uses of HHS DataDavid Portnoy
 
Open Data Discoverability
Open Data DiscoverabilityOpen Data Discoverability
Open Data DiscoverabilityDavid Portnoy
 
DDOD for FOIA organizations
DDOD for FOIA organizationsDDOD for FOIA organizations
DDOD for FOIA organizationsDavid Portnoy
 
Intro to Demand-Driven Open Data for Data Owners
Intro to Demand-Driven Open Data for Data OwnersIntro to Demand-Driven Open Data for Data Owners
Intro to Demand-Driven Open Data for Data OwnersDavid Portnoy
 
Intro to Demand Driven Open Data for Data Users
Intro to Demand Driven Open Data for Data UsersIntro to Demand Driven Open Data for Data Users
Intro to Demand Driven Open Data for Data UsersDavid Portnoy
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsDavid Portnoy
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceDavid Portnoy
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsDavid Portnoy
 

More from David Portnoy (10)

DDOD framework infographic
DDOD framework infographicDDOD framework infographic
DDOD framework infographic
 
Impact of DDOD on Data Quality - White House 2016
Impact of DDOD on Data Quality -  White House 2016Impact of DDOD on Data Quality -  White House 2016
Impact of DDOD on Data Quality - White House 2016
 
Industry Uses of HHS Data
Industry Uses of HHS DataIndustry Uses of HHS Data
Industry Uses of HHS Data
 
Open Data Discoverability
Open Data DiscoverabilityOpen Data Discoverability
Open Data Discoverability
 
DDOD for FOIA organizations
DDOD for FOIA organizationsDDOD for FOIA organizations
DDOD for FOIA organizations
 
Intro to Demand-Driven Open Data for Data Owners
Intro to Demand-Driven Open Data for Data OwnersIntro to Demand-Driven Open Data for Data Owners
Intro to Demand-Driven Open Data for Data Owners
 
Intro to Demand Driven Open Data for Data Users
Intro to Demand Driven Open Data for Data UsersIntro to Demand Driven Open Data for Data Users
Intro to Demand Driven Open Data for Data Users
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Case Study in Linked Data and Semantic Web: Human Genome

  • 1. - David Portnoy http://LinkedIn.com/in/DavidPortnoy 312.970.9740- Š Copyright 2012-2014 Datalytx, Inc. Case study in Linked Data and Semantic Web for the Human Genome domain NHGRI’s “GWAS Catalog” Project National Human Genome Research Institute
  • 2. īŽ Project Growth: About the Project īŽ Project Name: The National Human Genome Research Institute (NHGRI) Catalog of Published Genome-Wide Association Studies (GWAS) Catalog īŽ Project Description: Manually curated collection of published GWAS assaying at least 100,000 single-nucleotide polymorphisms (SNPs) and all SNP-trait associations with P <1 × 10−5. īŽ In addition to SNP-trait association data, provides the “Diagram Browser”, an interactive diagram of these associations mapped to the SNPs’ chromosomal locations. Stats as of Aug 2014: īŽ Almost 2,000 GWAS related publications īŽ Over 14,000 SNPs # of studies # of traits SNP-trait associations 2005 2014 Website: http://www.genome.gov/gwastudies/
  • 3. Accessing the data The GWAS Catalog can be accessed via īŽ Via the “Diagram Browser” ī‚¨ Implemented as a dynamic visualization on the human karyotype ī‚¨ With links to study publication, SNPs in Ensembl and ontology terms in EFO (Experimental Factor Ontology) īŽ Via a web query search interface ī‚¨ Provides tabular data for view or download ī‚¨ Includes traits and links to study publication īŽ Via other GWAS-related data portals, such as ī‚¨ Ensembl ī‚¨ UCSC Genome Browser ī‚¨ PheGenI ī‚¨ GWAS Central
  • 4. GWAS Components The project is implemented in 3 main components: 1. Curation / Data loading pipeline 2. Data Publisher 3. Diagram Browser Curation SNP Batch Loader PubMed Tracking Publisher Inference engine Ontology Loading Diagram Browser Knowledge Base Ontology Schema * The source code is managed under the GOCI (GWAS Ontology and Curation Infrastructure) project
  • 5. Application Implementation The following technologies have been used for this project ī‚§ Java for server-side processing ī‚§ Spring for MVC framework ī‚§ Maven for build automation and dependency management ī‚§ Apache Tomcat for web server ī‚§ Oracle for relational database ī‚§ HermiT for OWL reasoner ī‚§ JavaScript / AJAX for Diagram Browser interactivity ī‚§ SVG for rendering vector graphics in the Diagram Browser ī‚§ Apache POI for processing spreadsheets ī‚§ ColdFusion for generating records for each SNP * The source code is managed under the GOCI (GWAS Ontology and Curation Infrastructure) project
  • 7. Ontology schema needed Before the project could be implemented, an ontology had to be designed for its components to operate. Working backwards: īŽ The Diagram Browser needs to display GWAS related data in order to answer common GWAS use cases īŽ The Publisher needs to store data, such that it can be reasoned over and served up to the Diagram Browser īŽ The Batch Loader needs to extract GWAS data from publications in a consistent manner for later retrieval by the Publisher
  • 8. GWAS Catalog Ontology Was created by mapping each trait to one or more terms in the Experimental Factor Ontology (EFO) īŽ At the start, 20% of GWAS traits were already in EFO īŽ SKOS was used to extend EFO for GWAS- specific views īŽ 500 new terms were added to create GWAS-EFO-SKOS ontology Reasons for using EFO īŽ It’s actively developed īŽ It’s well suited to cover diversity of GWAS traits Metrics Number of classes 13,850 Number of individuals 370 Number of properties 50 Maximum depth Maximum # of children Average # of children Classes with a single child Classes with > 25 children Classes with no definition 15 700 7 500 100 13,500 * Note: GWAS Catalog Ontology and GWAS Diagram OWL have been used interchangeably
  • 9. GWAS Catalog Ontology (cont.) īŽ Purpose: Models the relationships between GWAS concepts of “SNP”, “trait” and “chromosome” to the Diagram īŽ Location of ontology schemas used: EFO schema: http://www.ebi.ac.uk/efo GWAS-Diagram schema: http://www.ebi.ac.uk/efo/gwas-diagram Class Hierarchy Object property hierarchy Data property hierarchy GWAS study chromosome īƒ  chromosome 1..23, īƒ  Chromosome X, Y cytogenetic band single nucleotide polymorphism trait association experimental factor has_part located_in location_of associated_with is_about has_about part_of has_name has_snp_reference_id has_bp_position has_length has_p_value has_pubmed_id has_author has_publication_date has_gwas_trait_name * Source: OntologyConstants.java; http://www.ebi.ac.uk/fgpt/gwas/ontology/gwas-diagram.owl
  • 10. Field definitions for OWL schema definitions 1. SNP reference ID: A single nucleotide polymorpism identifier, as assigned by the Single Nucleotide Polymorphism Database (dbSNP). 2. Base pair position: The position, in base pairs, of a particular element on a genome 3. Base pair length: The length, in base pairs, of any genomic element. 4. P-value: The probability of obtaining a test statistic at least as extreme as the one that was actually observed. 5. PubMed ID: The publication ID of a scientific paper, as assigned by the PubMed database. 6. Author: The primary author of a publication, usually expressed as surname followed by initial(s). 7. Publication date: A date on which a given entity was published 8. GWAS trait name: An arbitrary text label used to add a text definition of a GWAS trait name that is does not specificially map. Usually this will be used to annotate instances of Experimental Factor in order to retain information about a trait that was not defined in the ontology. 9. Chromosomes: Chromosome 1-23; Chromosomes X & Y 10. Trait association: An association that can be asserted between two entities with a degree of confidence expressed as a p-value. 11. GWAS Study: A study, described by a scientific publication, that identifies genome wide associations between single nucleotide polymorphisms and phylogenetic traits or disorders.
  • 11. Using SKOS for defining the GWAS Catalog ontology SKOS (Simple Knowledge Organization System) was used to create the GWAS Catalog ontology by extending the EFO ontology, because: īŽ Requires less expertise, effort and cost, since it is less semantically strict and expressive than OWL īŽ Can be used where the complexity of inferences is limited īŽ Is easy to use for extending other vocabularies Introduction to SKOS SKOS is an area of work developing specifications and standards to support the use of knowledge organization systems (KOS) such as thesauri, classification schemes, subject heading systems and taxonomies within the framework of the Semantic Web.
  • 12. Sample dataset generated by OWL API is broken intoâ€Ļ Data Property Assertion Class Assertion Object Property Assertion
  • 13. Advantage of ontology for traits Using a predefined ontology for describing traits (rather than unstructured lists) allows: 1. More complex, compounded and context- dependent traits to be described ī‚¨ e.g. “Type 2 diabetes and gout”; “Parkinson’s disease (interaction with caffeine)” 2. Creation of semantically meaningful links between traits 3. More complex and meaningful queries Traits â€ĸ Phenotypes, e.g. hair & eye color â€ĸ Treatment responses, e.g. response to antineoplastic agents â€ĸ Diseases, e.g. type 2 diabetes â€ĸ Assays, e.g. glcyoslyated haemoglogin level â€ĸ Chemical/drug names, e.g. C- reactive protein
  • 15. The Curation process is partially automated 1. Run automated literature searches to capture eligible studies 2. Enter them into the system for review by curators 3. Triage and assign papers to curator 4. Curators use use a web-based tracking and data entry system which allows multiple users to search, annotate, verify and publish the Catalog data. There are two levels of manual curation: a. First all data are extracted by one curator. b. Some studies could have more than 1000 significant SNPs. So curators create spreadsheets of SNPs for batch loading into the DB (using Apachi POI Java API for Microsoft Documents and a ColdFusion extension). c. Then data are double-checked for accuracy and consistency by another curator 5. Run the automated pipeline that: a. Checks multiple data sources for accuracy, completeness and consistency: PubMed, dbSNP, and NCBI's Gene database b. Adds genomic annotation such as SNP's base pair and cytogenetic location Literature search ID eligible studies Entry into workflow tool Triage & assignment Manual curation â€ĸ Data entry â€ĸ Check accuracy Automated pipeline â€ĸ Check against PubMed, dbSNP, NCBI â€ĸ Add annotation
  • 16. Creation of links to external data sources Each entry in the GWAS Catalog has links to supporting data sources for convenience Reference Source Sample Link / URI NCBI’s dbSNP http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1333049 Ensembl http://useast.ensembl.org/Homo_sapiens/Variation/Explore?r=9:22125003 -22126003;v=rs1333049;vdb=variation;vf=1004336 PubMed http://www.ncbi.nlm.nih.gov/pubmed?Db=pubmed&DbFrom=snp&Cmd=Li nk&LinkName=snp_pubmed_cited&LinkReadableName=Pubmed+(SNP+ Cited)&IdsFromResult=1333049 OMIM http://omim.org/entry/611139#0000 * Note that currently these are links for use by people, rather than machine readable linkages that would allow querying across multiple data sources
  • 17. Future: Opportunity for automating curation īŽ Machine learning and natural language processing (NLP) to categorize into traits defined in the GWAS Catalog ontology īŽ Assign categorization confidence metrics to assist processing workflow īŽ Accuracy can be verified by humans based on highlighting and annotations provided by NLP engine NLP processing & confidence assignment Workflow for human validation (where needed) Knowledge Base
  • 19. Data flow for GOCI Publisher Start with the Oracle relational database created by the Curation process Java Publisher app converts from the relational database into OWL individuals Knowledge base in format of GWAS Catalog ontology with 13,000 individuals and 43,000 axioms OWL API and HermiT reasoner create inferences from GWAS Catalog ontology Since it takes > 10 hours to run the reasoner, the job is run in batch and results are cached in RAM Results are retrieved by Diagram Browser with requests to app running on Tomcat server HermiT + OWL API SPARQL Endpoint (future) Knowledge Base (OWL individuals / triples cached in RAM) Relational Database (Oracle) Java Publisher job Knowledge Base with Inferred Triples (Cached in RAM) GWAS Diagram Browser
  • 20. Publisher’s output is to OWL triples â€Ļbecause this format is preferable to having the Diagram Browser query a relational database. The benefits are: īŽ Additional inferences about SNP-trait associations īŽ More expressive queries īŽ Ability to detect errors or inconsistencies, as defined by the ontology Using direct queries Using OWL knowledge base Data has unstructured catalog of traits and in a fixed relational schema Data is structured in semantic triples and reasoned over using an ontology Queries can be only on string pattern matching and must be done one at a time. It’s not possible to query for related or inferred traits. Queries can include inferences and complex questions Example queries: â€ĸ Can search on trait name containing “diabetes” and get results for both type 1 and type 2 diabetes â€ĸ Comparison between gastric and esophageal cancers requires manually combining results from two distinct searches Example queries: * â€ĸ Find all SNPs that are associated with cancers located in the upper digestive tract â€ĸ Find all SNPs located on chromosomes 5, 7, 15 and 21 that are associated with diseases located in the urinary tract, with a p-value smaller than 10-8 * Source: Welter, D., Burdett, T., et al. (2012) Ontology-driven visualization of NHGRI GWAS data
  • 21. HermiT OWL Reasoner īŽ HermiT is a reasoner for ontologies written using OWL (Web Ontology Language). It is a ProtÊgÊ plugin. īŽ HermiT can determine whether the ontology for any given OWL file is consistent and identify the relationship between classes īŽ HermiT passes all OWL 2 conformance tests for direct semantics reasoners īŽ HermiT can be accessed from Java apps through the OWL API īŽ OWL API is a Java interface for creating, manipulating and serializing OWL Ontologies īŽ It includes parsers and writers for RDF, OWL and Turtle, as well as interface for working with reasoners
  • 22. HermiT reasoner is implemented with “forward chaining” īŽ How it works: Rules are processed by reasoner once in batch mode to generate and cache inferred triples īŽ Best when: īŽ Rules of inference and original data don’t change often īŽ There’s sufficient disk and RAM to store all the inferred triples īŽ Benefits: Retrieval queries run faster īŽ Limitation: When rules or explicit data set changes, it may be necessary to empty and reload the entire data store and re-run the reasoner over it again
  • 24. What is the Diagram Browser? It’s a diagram that shows SNP-trait associations mapped to the SNPs’ chromosomal locations of the human karyotype. This project has made significant improvements to it: īŽ Originally: The diagram used to be a static document manually created on a quarterly basis (by a medical illustrator) īŽ Now: Creation is fully automated with each study added and it is interactive, so that it can be explored dynamically
  • 25. Diagram Browser: Interactive functionality Clicking on SNP-associated trait category enables selection of only bands with relevant traits Zoom in and hover over chromosomes in order to see traits by chromosomal location Clicking on diagram displays all SNPs for a trait and band
  • 26. How is the Diagram Browser implemented? 1. The Diagram Browser is a JavaScript app rendered on the client browser 2. Interaction with the diagram, such as filter, zoom or click, generates a query 3. The query request is sent via AJAX from the web client to the Tomcat server 4. The server runs a Java program that converts this request into an OWL class expression which is processed by the reasoner 5. The query result causes a string of SVG (Scalable Vector Graphics) code to be generated 6. This code is sent back to the web client via AJAX 7. The JavaScript app renders the SVG provided Web Browser JavaScript app Web Server Knowledge Base (using GWAS Catalog ontology) Generate AJAX request Render SVG code 1 Trigger: Filter, zoom, click 2 3 4 6 5 7 Process request Generate SVG
  • 28. Future scalability Will run into scalability issues asâ€Ļ īŽ Size of knowledge base grows īŽ Tools for querying the knowledge base become more sophisticated Current Implementation Short term solution Long Term Solution īŽ Monitor system resources and increase where there are bottlenecks īŽ Limit queries to a predefined ranges īŽ Precompute more inferences, based on query frequency īŽ Migrate to a persistent RDF triplestore (such as Virtuoso) from the knowledge base īŽ Implement SPARQL endpoint for queries instead of using OWL class expressions īŽ Consider backward chaining reasoner if inferred data set gets too big to cache
  • 29. Future “backward chaining” option īŽ How it works: Reasoner is deployed between the GWAS Diagram or SPARQL endpoint and data store, so that inferred triples are generated in real time as part of query result set īŽ Best when: īŽ Rules of inference and original data change often īŽ Disk or RAM is insufficient to store all the inferred triples īŽ Benefits: No need to re-run reasoner when data or rules change īŽ Limitation: Query response may be slow
  • 30. SPARQL Example: GWAS Central īŽ Although the NHGRI project currently doesn’t host a live SPARQL endpoint, it could be set up to do so īŽ The GWAS Central project already does this. (It collates data from a range of sources, including the published literature and collaborating databases such as the NHGRI GWAS Catalog.) SPARQL query page for GWAS Centeral http://fuseki.gwascentral.org/q uery.html
  • 31. SPARQL Example: EBI’s Atlas īŽ EBI hosts the GWAS Diagram, but doesn’t provide a SPARQL endpoint associated with that project īŽ It does however host SPARQL endpoints for multiple other projects, such as Atlas SPARQL query page and multiple examples for EBI’s Atlas project (https://www.ebi.ac.uk/rdf/services/atlas/sparql)
  • 32. GWAS Central: Towards Federation īŽ GWAS Central is a comprehensive resource for the comparison and interrogation of multiple GWAS (genome-wide association studies) projects īŽ Allows for storage, mining and display of summary-level association data īŽ More comprehensive than other openly available projects with a similar focus (ie, millions vs. thousands of P-values ) īŽ Provides user tools and interfaces not previously available from a single resource īŽ Aggregates other related resources: ī‚¨ GWAS Catalog ī‚¨ OADGAR ī‚¨ SNPedia īŽ GWAS Central platform is available for adoption by other institutes, consortia, teams and countries ī‚¨ Ideally, multiple implementations can be federated to allow searching across multiple data sets
  • 33. GWAS Central: Towards Federation (cont.) Comparison of features for GWAS Central, GWAS Catalog, OADGAR*, SNPedia * Open Access Database of Genome-wide Association Results
  • 34. GWAS Central: Towards Federation (cont.) SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. This specification defines the syntax and semantics of SPARQL 1.1 Federated Query extension for executing queries distributed over different SPARQL endpoints. The SERVICE keyword extends SPARQL 1.1 to support queries that merge data distributed across the Web. Source: http://www.w3.org/TR/sparql11-federated-query/
  • 35. Setting up GWAS Catalog project to query across data sets Querying across databases using EFO: Since the GWAS Catalog is based on EFO, it’s possible for a query to include other biomedical databases annotated for EFO: ArrayExpress, Ensembl, BioSamples, Pride, etc. Querying across databases using other ontologies: Even if EFO is not used, cross reference definition citations allows querying across ontologies. The ID of an external class is added as an annotation on the relevant EFO term. Example: Connective tissue is an EFO term that has been mapped to terms in other ontologies, such as term BTO:0000421, the identifier for connective tissue in the Brenda ontology.