SlideShare a Scribd company logo
1 of 28
Download to read offline
UniProt and the Semantic Web
                     Chimezie Ogbuji
‘Omics’ Data Challenges
 Advances in protein science is a major catalyst in the
  exploding availability of bioinformatics data
 We have already discussed the dimensions of omics
  data:
   Molecular components, interactions, and phenotype
    observations

 Data from large-scale experiments are no longer
  published conventionally but stored in a database
 Protein sequence databases are one of the most
  comprehensive information resources for scientists
Protein Sequence Databases
 Universal protein sequence databases cover all species

 Specialized protein databases are particular to a protein
  family or organism

 Sequence repositories
   A simple registry of sequence record
   No annotations

 Curated protein databases
   Enrich sequence information with links to various sources
    (scientific literature primarily)
Informatics Challenges
 Standard data integration challenge is the lack of
  common conventions

 Applies to not just notation but also to:
   Use of identifiers
   Representation of cross-references
   Framework for defining terms and relationships between
    them

 Links between omics sources is another important
  component of data integration
What is UniProt?
 A comprehensive repository of protein sequences and
  their functional annotations

 Curators add value to raw data by annotations against
  scientific literature

 Objective is: the creation and maintenance of stable,
  comprehensive, and high-quality protein databases,
  with high level of accessibility, to facilitate cross-
  database information retrival

 Makes use of Semantic Web technologies to address its
  challenges
UniProt: Core Activities
 Sequence archiving

 Manual (peer-reviewed) and automated curation of
  sequences

 Development of human / machine-readable Uniprot web
  site

 Interaction with other protein-related databases for
  expanding cross references
UniProt: Components
  UniProtKB –Protein sequence annotations and metadata:
    Protein name, function, taxonomy, enzyme-specific
     information, domains, sites, subcellular location, interactions,
     relationships to disease etc.
    Links to external sources: DNA sequence repositories, protein
     structure databases, protein domain and family databases, and
     species & function-specific data collections
  UniRef – Compresses sequences at different resolutions
    Parameterized by percent of how identical two sequences or
     sub-sequences are (100,90,50).
  UniParc – Non-redundant database of all publically
   available protein sequences
    Manages globaly-unique identifers, the sequence, information
     on source database, and CRC check number.
Semantic Web Technologies
 Set of standards for managing web-based content in a way
  that emphasizes use by an automaton
   Automaton: a machine that performs a function according to
    a predetermined set of coded instructions
 The architectural vision (the Semantic Web) is to extend the
  standards and best practices behind the World-wide Web with
  new standards that emphasize meaning over structure of
  data.
   Common data formats
   Provide a means to make assertions about the world such that
     an automaton can reason about it through them
 The vision is often confused with the tools meant to achieve
  it (i.e., set of standards)
RDF: Data Model
 Standardized format for representating arbitrary
  information as a labelled, directed graph

 Comprised of statements: subject, predicate, object

 Terms in statements can be Universal Resource
  Identifiers (URIs), Blank Nodes (anonymous entities), or
  Literals

 Abstract data model: a labelled, directed graph

 Various serializations: XML-based and text-based
Information About John Smith
Modelling vocabulary: RDFS/OWL
 RDF Schema (RDFS)
   Simple, minimal schema language for RDF

 Ontology Web Language (OWL)
   Vocabulary for defining classes, relationships, and various
    constraints that limit how RDF is interpreted
   More powerful modeling language

 Tools for constraining & defining reality that can be
  used to codify scientific understanding
 Gene Ontology is modelled in this way to capture our
  understanding of macromolecular reality
Query Language: SPARQL
 Provides a common graph-matching language for
  querying RDF data

 Similar to SQL in many respects
Nature of UniProt Data
 Very large number of cross references to external
  resources

 Cross-reference topology that of a graph not a tree

 Automated and manual annotation require storage of
  provenance information (how / when data was
  acquired)

 Requires a framework for both data as well as metadata
  (data about data)
UniProt Distribution
UniProt: Data Conventions
 All outbound RDF statements are grouped together
  (statements about the same subject)

 Datasets (nodes in previous graph) are distributed as a
  single file

 Only stores stated data, not entailed data.
   For instance, relationships involving symmetric properties
    are only stored in one direction
UniProt: Naming Conventions
 Generally, in semiotics: a symbol denotes a referent.

 In Web architecture, URIs identify resources
   URIs that can be resolved over the web are URLs

 UniProt URIs identify:
   Resources that correspond to database entries
   Modeling vocabulary that use standard namespaces: RDFS
    and OWL
   Classes and properties used by UniProt
     For ex: http://purl.uniprot.org/core/Gene
   Resources without stable identifiers (from their source)
The Omics Identification Problem
 UniProt uses a templated naming convention:
   http://purl.uniprot.org/{database}/{identifier}
   http://purl.uniprot.org/uniprot/{protein_identifier}

 Problem
     http://purl.uniprot.org/uniprot/P04926 denotes the Malaria
      protein EX-1
     If loading that address in a browser returns a web page, can an
      automaton infer that Malaria protein EX-1 is a web page?
     How do you identify abstract concepts v.s. digital media
The PURL Solution
 Persistent Uniform Resource Locator (PURL) is a public
  URI management service for allocating a ‘URI space’ as
  a mapping of identifiers (aliases) for resources they are
  not immediately responsible for
 PURLs are web addresses that act as permanent
  identifiers in the face of a dynamic and changing Web
  infrastructure
 A request to a PURL returns a 303 HTTP status code and
  a location:
   303 indicates that a response can be found under the
    returned location
The PURL Solution: Continued
 Can use PURL addresses to identify abstract concepts

 Redirect requests to such addresses to an informative
  web page (for humans) with a means for machines to
  extract other formats

 RDF statements are about proteins, machines can
  reasons about proteins, and humans resolve protein
  identifiers to view informative web pages
 RDF/XML link:

    http://www.uniprot.org/uniprot/P04926.rdf
UniProt: Protein Class
UniProt: Annotation Hierarchy
Serendipitous Re-use
 Having a rich repository of protein sequence metadata,
  annotations, and taxonomic classification in a
  distributed, standard format encourages scientific
  collaboration
General UniProt Re-Use Scenario
 User A refers to protein P1 in their dataset
   User A’s dataset doesn’t include statements about P1 (the
    host organism for instance)

 User B comes across this dataset and (in order to find
  out more about protein P1) puts the URI of protein P1
  in their browser and pulls up human-readable
  information about it (including the host organism)
 Automaton C comes across the same dataset, fetches
  the web page, fetches the RDF about P1 and has access
  to the same information as user B and can reason about
  the major taxon the host organism belongs to
References

 Wu, C. et.al.,”The Universal Protein Resource
  (UniProt): an expanding universe of protein
  information”. Nucleic Acids Research, vol. 34. 2006

 Swiss Institute of Bioinformatics, “UniProt RDF (project
  page)”. http://dev.isb-sib.ch/projects/uniprot-rdf/

 Redaschi, N. and UniProt Consortium, “UniProt in RDF:
  Tackling Data Integration and Distributed Annotation”
  Nature Proceedings, 3rd International Biocuration
  Conference, April 2009.
  http://precedings.nature.com/documents/3193/version/1

More Related Content

What's hot (20)

Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Protein sequence databases
Protein sequence databasesProtein sequence databases
Protein sequence databases
 
Protein database
Protein databaseProtein database
Protein database
 
Prosite
PrositeProsite
Prosite
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
 
Kegg databse
Kegg databseKegg databse
Kegg databse
 
protein data bank
protein data bankprotein data bank
protein data bank
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
 
Composite and Specialized databases
Composite and Specialized databasesComposite and Specialized databases
Composite and Specialized databases
 
Cath
CathCath
Cath
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
Kegg
KeggKegg
Kegg
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyana
 
EMBL
EMBLEMBL
EMBL
 
Molecular modeling database
Molecular modeling database Molecular modeling database
Molecular modeling database
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
UniProt
UniProtUniProt
UniProt
 
Ddbj
DdbjDdbj
Ddbj
 

Viewers also liked

La muerte y la tortura no es arte ni cultura
La muerte y la tortura no es arte ni culturaLa muerte y la tortura no es arte ni cultura
La muerte y la tortura no es arte ni culturaAchaku
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLJerven Bolleman
 
UniProt & Ontologies
UniProt & OntologiesUniProt & Ontologies
UniProt & OntologiesEric Jain
 
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001Zohaib HUSSAIN
 
Advanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osuAdvanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osuBen Busby
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database nadeem akhter
 
Linked Data Management
Linked Data ManagementLinked Data Management
Linked Data ManagementMarin Dimitrov
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOAEBI
 

Viewers also liked (9)

La muerte y la tortura no es arte ni cultura
La muerte y la tortura no es arte ni culturaLa muerte y la tortura no es arte ni cultura
La muerte y la tortura no es arte ni cultura
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
 
UniProt & Ontologies
UniProt & OntologiesUniProt & Ontologies
UniProt & Ontologies
 
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
 
Advanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osuAdvanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osu
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database
 
Linked Data Management
Linked Data ManagementLinked Data Management
Linked Data Management
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar to UniProt and the Semantic Web

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
A Reason Able View To The Web Of Pathway Data
A Reason Able View To The Web Of Pathway DataA Reason Able View To The Web Of Pathway Data
A Reason Able View To The Web Of Pathway Dataguest9fc5f3
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisationBiogeeks
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsChimezie Ogbuji
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAGopen_phacts
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldAmit Sheth
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysisDr. Naveen Gaurav srivastava
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesHammad Afzal
 
Open library data and embrace the world library linked data
Open library data and embrace the world library linked dataOpen library data and embrace the world library linked data
Open library data and embrace the world library linked data皓仁 柯
 
Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Sreekanth Gali
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003robertstevens65
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdfnedalalazzwy
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Laurent Alquier
 
PRO Use Cases for Scientific Communities
PRO Use Cases for Scientific CommunitiesPRO Use Cases for Scientific Communities
PRO Use Cases for Scientific CommunitiesPaolo Ciccarese
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONGI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONENCODE-DCC
 
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...Trish Whetzel
 

Similar to UniProt and the Semantic Web (20)

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
A Reason Able View To The Web Of Pathway Data
A Reason Able View To The Web Of Pathway DataA Reason Able View To The Web Of Pathway Data
A Reason Able View To The Web Of Pathway Data
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical Informatics
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-World
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
Open library data and embrace the world library linked data
Open library data and embrace the world library linked dataOpen library data and embrace the world library linked data
Open library data and embrace the world library linked data
 
Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)
 
Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
BioPortal: ontologies and integrated data resources at the click of a mouse
BioPortal: ontologies and integrated data resourcesat the click of a mouseBioPortal: ontologies and integrated data resourcesat the click of a mouse
BioPortal: ontologies and integrated data resources at the click of a mouse
 
PRO Use Cases for Scientific Communities
PRO Use Cases for Scientific CommunitiesPRO Use Cases for Scientific Communities
PRO Use Cases for Scientific Communities
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONGI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
 
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
 

More from Chimezie Ogbuji

Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxChimezie Ogbuji
 
Using OWL for the RESO Data Dictionary
Using OWL for the RESO Data DictionaryUsing OWL for the RESO Data Dictionary
Using OWL for the RESO Data DictionaryChimezie Ogbuji
 
Semantic Web use cases in outcomes research
Semantic Web use cases in outcomes researchSemantic Web use cases in outcomes research
Semantic Web use cases in outcomes researchChimezie Ogbuji
 
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Chimezie Ogbuji
 
Automated clinicalontologyextraction
Automated clinicalontologyextractionAutomated clinicalontologyextraction
Automated clinicalontologyextractionChimezie Ogbuji
 
GRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and WhereGRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and WhereChimezie Ogbuji
 
GRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachGRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachChimezie Ogbuji
 
Tools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDLTools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDLChimezie Ogbuji
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsChimezie Ogbuji
 
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsSegmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsChimezie Ogbuji
 
Overview of CPR Ontology
Overview of CPR OntologyOverview of CPR Ontology
Overview of CPR OntologyChimezie Ogbuji
 
The Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are ImportantThe Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are ImportantChimezie Ogbuji
 

More from Chimezie Ogbuji (12)

Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
 
Using OWL for the RESO Data Dictionary
Using OWL for the RESO Data DictionaryUsing OWL for the RESO Data Dictionary
Using OWL for the RESO Data Dictionary
 
Semantic Web use cases in outcomes research
Semantic Web use cases in outcomes researchSemantic Web use cases in outcomes research
Semantic Web use cases in outcomes research
 
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
 
Automated clinicalontologyextraction
Automated clinicalontologyextractionAutomated clinicalontologyextraction
Automated clinicalontologyextraction
 
GRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and WhereGRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and Where
 
GRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachGRDDL: A Pictorial Approach
GRDDL: A Pictorial Approach
 
Tools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDLTools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDL
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
 
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsSegmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical Informatics
 
Overview of CPR Ontology
Overview of CPR OntologyOverview of CPR Ontology
Overview of CPR Ontology
 
The Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are ImportantThe Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are Important
 

Recently uploaded

Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...chandars293
 
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Servicevidya singh
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...hotbabesbook
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Call Girls in Nagpur High Profile
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escortsaditipandeya
 
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋TANUJA PANDEY
 
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...narwatsonia7
 
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...Garima Khatri
 
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Dipal Arora
 
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...perfect solution
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...indiancallgirl4rent
 
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...Taniya Sharma
 
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...narwatsonia7
 

Recently uploaded (20)

Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
 
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
Book Paid Powai Call Girls Mumbai 𖠋 9930245274 𖠋Low Budget Full Independent H...
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
 
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
Top Rated Bangalore Call Girls Mg Road ⟟ 8250192130 ⟟ Call Me For Genuine Sex...
 
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Tirupati Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
 
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
 
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
College Call Girls in Haridwar 9667172968 Short 4000 Night 10000 Best call gi...
 
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
(Rocky) Jaipur Call Girl - 09521753030 Escorts Service 50% Off with Cash ON D...
 
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
 
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
 

UniProt and the Semantic Web

  • 1. UniProt and the Semantic Web Chimezie Ogbuji
  • 2. ‘Omics’ Data Challenges  Advances in protein science is a major catalyst in the exploding availability of bioinformatics data  We have already discussed the dimensions of omics data:  Molecular components, interactions, and phenotype observations  Data from large-scale experiments are no longer published conventionally but stored in a database  Protein sequence databases are one of the most comprehensive information resources for scientists
  • 3. Protein Sequence Databases  Universal protein sequence databases cover all species  Specialized protein databases are particular to a protein family or organism  Sequence repositories  A simple registry of sequence record  No annotations  Curated protein databases  Enrich sequence information with links to various sources (scientific literature primarily)
  • 4. Informatics Challenges  Standard data integration challenge is the lack of common conventions  Applies to not just notation but also to:  Use of identifiers  Representation of cross-references  Framework for defining terms and relationships between them  Links between omics sources is another important component of data integration
  • 5. What is UniProt?  A comprehensive repository of protein sequences and their functional annotations  Curators add value to raw data by annotations against scientific literature  Objective is: the creation and maintenance of stable, comprehensive, and high-quality protein databases, with high level of accessibility, to facilitate cross- database information retrival  Makes use of Semantic Web technologies to address its challenges
  • 6. UniProt: Core Activities  Sequence archiving  Manual (peer-reviewed) and automated curation of sequences  Development of human / machine-readable Uniprot web site  Interaction with other protein-related databases for expanding cross references
  • 7. UniProt: Components  UniProtKB –Protein sequence annotations and metadata:  Protein name, function, taxonomy, enzyme-specific information, domains, sites, subcellular location, interactions, relationships to disease etc.  Links to external sources: DNA sequence repositories, protein structure databases, protein domain and family databases, and species & function-specific data collections  UniRef – Compresses sequences at different resolutions  Parameterized by percent of how identical two sequences or sub-sequences are (100,90,50).  UniParc – Non-redundant database of all publically available protein sequences  Manages globaly-unique identifers, the sequence, information on source database, and CRC check number.
  • 8. Semantic Web Technologies  Set of standards for managing web-based content in a way that emphasizes use by an automaton  Automaton: a machine that performs a function according to a predetermined set of coded instructions  The architectural vision (the Semantic Web) is to extend the standards and best practices behind the World-wide Web with new standards that emphasize meaning over structure of data.  Common data formats  Provide a means to make assertions about the world such that an automaton can reason about it through them  The vision is often confused with the tools meant to achieve it (i.e., set of standards)
  • 9.
  • 10. RDF: Data Model  Standardized format for representating arbitrary information as a labelled, directed graph  Comprised of statements: subject, predicate, object  Terms in statements can be Universal Resource Identifiers (URIs), Blank Nodes (anonymous entities), or Literals  Abstract data model: a labelled, directed graph  Various serializations: XML-based and text-based
  • 12. Modelling vocabulary: RDFS/OWL  RDF Schema (RDFS)  Simple, minimal schema language for RDF  Ontology Web Language (OWL)  Vocabulary for defining classes, relationships, and various constraints that limit how RDF is interpreted  More powerful modeling language  Tools for constraining & defining reality that can be used to codify scientific understanding  Gene Ontology is modelled in this way to capture our understanding of macromolecular reality
  • 13.
  • 14. Query Language: SPARQL  Provides a common graph-matching language for querying RDF data  Similar to SQL in many respects
  • 15. Nature of UniProt Data  Very large number of cross references to external resources  Cross-reference topology that of a graph not a tree  Automated and manual annotation require storage of provenance information (how / when data was acquired)  Requires a framework for both data as well as metadata (data about data)
  • 17. UniProt: Data Conventions  All outbound RDF statements are grouped together (statements about the same subject)  Datasets (nodes in previous graph) are distributed as a single file  Only stores stated data, not entailed data.  For instance, relationships involving symmetric properties are only stored in one direction
  • 18.
  • 19. UniProt: Naming Conventions  Generally, in semiotics: a symbol denotes a referent.  In Web architecture, URIs identify resources  URIs that can be resolved over the web are URLs  UniProt URIs identify:  Resources that correspond to database entries  Modeling vocabulary that use standard namespaces: RDFS and OWL  Classes and properties used by UniProt  For ex: http://purl.uniprot.org/core/Gene  Resources without stable identifiers (from their source)
  • 20. The Omics Identification Problem  UniProt uses a templated naming convention:  http://purl.uniprot.org/{database}/{identifier}  http://purl.uniprot.org/uniprot/{protein_identifier}  Problem  http://purl.uniprot.org/uniprot/P04926 denotes the Malaria protein EX-1  If loading that address in a browser returns a web page, can an automaton infer that Malaria protein EX-1 is a web page?  How do you identify abstract concepts v.s. digital media
  • 21. The PURL Solution  Persistent Uniform Resource Locator (PURL) is a public URI management service for allocating a ‘URI space’ as a mapping of identifiers (aliases) for resources they are not immediately responsible for  PURLs are web addresses that act as permanent identifiers in the face of a dynamic and changing Web infrastructure  A request to a PURL returns a 303 HTTP status code and a location:  303 indicates that a response can be found under the returned location
  • 22. The PURL Solution: Continued  Can use PURL addresses to identify abstract concepts  Redirect requests to such addresses to an informative web page (for humans) with a means for machines to extract other formats  RDF statements are about proteins, machines can reasons about proteins, and humans resolve protein identifiers to view informative web pages
  • 23.  RDF/XML link:  http://www.uniprot.org/uniprot/P04926.rdf
  • 26. Serendipitous Re-use  Having a rich repository of protein sequence metadata, annotations, and taxonomic classification in a distributed, standard format encourages scientific collaboration
  • 27. General UniProt Re-Use Scenario  User A refers to protein P1 in their dataset  User A’s dataset doesn’t include statements about P1 (the host organism for instance)  User B comes across this dataset and (in order to find out more about protein P1) puts the URI of protein P1 in their browser and pulls up human-readable information about it (including the host organism)  Automaton C comes across the same dataset, fetches the web page, fetches the RDF about P1 and has access to the same information as user B and can reason about the major taxon the host organism belongs to
  • 28. References  Wu, C. et.al.,”The Universal Protein Resource (UniProt): an expanding universe of protein information”. Nucleic Acids Research, vol. 34. 2006  Swiss Institute of Bioinformatics, “UniProt RDF (project page)”. http://dev.isb-sib.ch/projects/uniprot-rdf/  Redaschi, N. and UniProt Consortium, “UniProt in RDF: Tackling Data Integration and Distributed Annotation” Nature Proceedings, 3rd International Biocuration Conference, April 2009. http://precedings.nature.com/documents/3193/version/1