SlideShare a Scribd company logo
UniProt and the Semantic Web
                     Chimezie Ogbuji
‘Omics’ Data Challenges
 Advances in protein science is a major catalyst in the
  exploding availability of bioinformatics data
 We have already discussed the dimensions of omics
  data:
   Molecular components, interactions, and phenotype
    observations

 Data from large-scale experiments are no longer
  published conventionally but stored in a database
 Protein sequence databases are one of the most
  comprehensive information resources for scientists
Protein Sequence Databases
 Universal protein sequence databases cover all species

 Specialized protein databases are particular to a protein
  family or organism

 Sequence repositories
   A simple registry of sequence record
   No annotations

 Curated protein databases
   Enrich sequence information with links to various sources
    (scientific literature primarily)
Informatics Challenges
 Standard data integration challenge is the lack of
  common conventions

 Applies to not just notation but also to:
   Use of identifiers
   Representation of cross-references
   Framework for defining terms and relationships between
    them

 Links between omics sources is another important
  component of data integration
What is UniProt?
 A comprehensive repository of protein sequences and
  their functional annotations

 Curators add value to raw data by annotations against
  scientific literature

 Objective is: the creation and maintenance of stable,
  comprehensive, and high-quality protein databases,
  with high level of accessibility, to facilitate cross-
  database information retrival

 Makes use of Semantic Web technologies to address its
  challenges
UniProt: Core Activities
 Sequence archiving

 Manual (peer-reviewed) and automated curation of
  sequences

 Development of human / machine-readable Uniprot web
  site

 Interaction with other protein-related databases for
  expanding cross references
UniProt: Components
  UniProtKB –Protein sequence annotations and metadata:
    Protein name, function, taxonomy, enzyme-specific
     information, domains, sites, subcellular location, interactions,
     relationships to disease etc.
    Links to external sources: DNA sequence repositories, protein
     structure databases, protein domain and family databases, and
     species & function-specific data collections
  UniRef – Compresses sequences at different resolutions
    Parameterized by percent of how identical two sequences or
     sub-sequences are (100,90,50).
  UniParc – Non-redundant database of all publically
   available protein sequences
    Manages globaly-unique identifers, the sequence, information
     on source database, and CRC check number.
Semantic Web Technologies
 Set of standards for managing web-based content in a way
  that emphasizes use by an automaton
   Automaton: a machine that performs a function according to
    a predetermined set of coded instructions
 The architectural vision (the Semantic Web) is to extend the
  standards and best practices behind the World-wide Web with
  new standards that emphasize meaning over structure of
  data.
   Common data formats
   Provide a means to make assertions about the world such that
     an automaton can reason about it through them
 The vision is often confused with the tools meant to achieve
  it (i.e., set of standards)
RDF: Data Model
 Standardized format for representating arbitrary
  information as a labelled, directed graph

 Comprised of statements: subject, predicate, object

 Terms in statements can be Universal Resource
  Identifiers (URIs), Blank Nodes (anonymous entities), or
  Literals

 Abstract data model: a labelled, directed graph

 Various serializations: XML-based and text-based
Information About John Smith
Modelling vocabulary: RDFS/OWL
 RDF Schema (RDFS)
   Simple, minimal schema language for RDF

 Ontology Web Language (OWL)
   Vocabulary for defining classes, relationships, and various
    constraints that limit how RDF is interpreted
   More powerful modeling language

 Tools for constraining & defining reality that can be
  used to codify scientific understanding
 Gene Ontology is modelled in this way to capture our
  understanding of macromolecular reality
Query Language: SPARQL
 Provides a common graph-matching language for
  querying RDF data

 Similar to SQL in many respects
Nature of UniProt Data
 Very large number of cross references to external
  resources

 Cross-reference topology that of a graph not a tree

 Automated and manual annotation require storage of
  provenance information (how / when data was
  acquired)

 Requires a framework for both data as well as metadata
  (data about data)
UniProt Distribution
UniProt: Data Conventions
 All outbound RDF statements are grouped together
  (statements about the same subject)

 Datasets (nodes in previous graph) are distributed as a
  single file

 Only stores stated data, not entailed data.
   For instance, relationships involving symmetric properties
    are only stored in one direction
UniProt: Naming Conventions
 Generally, in semiotics: a symbol denotes a referent.

 In Web architecture, URIs identify resources
   URIs that can be resolved over the web are URLs

 UniProt URIs identify:
   Resources that correspond to database entries
   Modeling vocabulary that use standard namespaces: RDFS
    and OWL
   Classes and properties used by UniProt
     For ex: http://purl.uniprot.org/core/Gene
   Resources without stable identifiers (from their source)
The Omics Identification Problem
 UniProt uses a templated naming convention:
   http://purl.uniprot.org/{database}/{identifier}
   http://purl.uniprot.org/uniprot/{protein_identifier}

 Problem
     http://purl.uniprot.org/uniprot/P04926 denotes the Malaria
      protein EX-1
     If loading that address in a browser returns a web page, can an
      automaton infer that Malaria protein EX-1 is a web page?
     How do you identify abstract concepts v.s. digital media
The PURL Solution
 Persistent Uniform Resource Locator (PURL) is a public
  URI management service for allocating a ‘URI space’ as
  a mapping of identifiers (aliases) for resources they are
  not immediately responsible for
 PURLs are web addresses that act as permanent
  identifiers in the face of a dynamic and changing Web
  infrastructure
 A request to a PURL returns a 303 HTTP status code and
  a location:
   303 indicates that a response can be found under the
    returned location
The PURL Solution: Continued
 Can use PURL addresses to identify abstract concepts

 Redirect requests to such addresses to an informative
  web page (for humans) with a means for machines to
  extract other formats

 RDF statements are about proteins, machines can
  reasons about proteins, and humans resolve protein
  identifiers to view informative web pages
 RDF/XML link:

    http://www.uniprot.org/uniprot/P04926.rdf
UniProt: Protein Class
UniProt: Annotation Hierarchy
Serendipitous Re-use
 Having a rich repository of protein sequence metadata,
  annotations, and taxonomic classification in a
  distributed, standard format encourages scientific
  collaboration
General UniProt Re-Use Scenario
 User A refers to protein P1 in their dataset
   User A’s dataset doesn’t include statements about P1 (the
    host organism for instance)

 User B comes across this dataset and (in order to find
  out more about protein P1) puts the URI of protein P1
  in their browser and pulls up human-readable
  information about it (including the host organism)
 Automaton C comes across the same dataset, fetches
  the web page, fetches the RDF about P1 and has access
  to the same information as user B and can reason about
  the major taxon the host organism belongs to
References

 Wu, C. et.al.,”The Universal Protein Resource
  (UniProt): an expanding universe of protein
  information”. Nucleic Acids Research, vol. 34. 2006

 Swiss Institute of Bioinformatics, “UniProt RDF (project
  page)”. http://dev.isb-sib.ch/projects/uniprot-rdf/

 Redaschi, N. and UniProt Consortium, “UniProt in RDF:
  Tackling Data Integration and Distributed Annotation”
  Nature Proceedings, 3rd International Biocuration
  Conference, April 2009.
  http://precedings.nature.com/documents/3193/version/1

More Related Content

What's hot

Immunoelectron microscopy
Immunoelectron microscopy Immunoelectron microscopy
Immunoelectron microscopy
Meghna Thiruveedi
 
Flow cytometry and fluorescence activated cell sorting (FACS)
Flow cytometry and fluorescence activated cell sorting (FACS)Flow cytometry and fluorescence activated cell sorting (FACS)
Flow cytometry and fluorescence activated cell sorting (FACS)
Abu Sufiyan Chhipa
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
ALLIENU
 
HELIX-LOOP-HELIX, HELIX-TURN-HELIX
HELIX-LOOP-HELIX, HELIX-TURN-HELIXHELIX-LOOP-HELIX, HELIX-TURN-HELIX
HELIX-LOOP-HELIX, HELIX-TURN-HELIX
naren
 
Proteomics 2 d gel, mass spectrometry, maldi tof
Proteomics 2 d gel, mass spectrometry, maldi tofProteomics 2 d gel, mass spectrometry, maldi tof
Proteomics 2 d gel, mass spectrometry, maldi tof
nirvarna gr
 
Abzymes
AbzymesAbzymes
Abzymes
jeeva raj
 
Linker, Adaptor, Homopolymeric Tailing & Terminal Transferase
Linker, Adaptor, Homopolymeric Tailing & Terminal TransferaseLinker, Adaptor, Homopolymeric Tailing & Terminal Transferase
Linker, Adaptor, Homopolymeric Tailing & Terminal Transferase
Utsa Roy
 
Polyadenylation
PolyadenylationPolyadenylation
Polyadenylation
EmaSushan
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
Vidya Kalaivani Rajkumar
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
Arindam Ghosh
 
Genomic library construction
Genomic library constructionGenomic library construction
Genomic library construction
Gurvinder Kaur
 
Rna splicing
Rna splicingRna splicing
Rna splicing
Prachee Rajput
 
Protein array, types and application
Protein array, types and applicationProtein array, types and application
Protein array, types and application
KAUSHAL SAHU
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
Hafiz Muhammad Zeeshan Raza
 
Homology modeling
Homology modelingHomology modeling
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
Ramya S
 
DNA Microarray notes.pdf
DNA Microarray notes.pdfDNA Microarray notes.pdf
DNA Microarray notes.pdf
RajendraChavhan3
 
DNA protein interaction.pptx
DNA protein interaction.pptxDNA protein interaction.pptx
DNA protein interaction.pptx
shwetaliprajapati
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
Bioinformatics and Computational Biosciences Branch
 
Swiss pdb viewer
Swiss pdb viewerSwiss pdb viewer
Swiss pdb viewer
Vidya Kalaivani Rajkumar
 

What's hot (20)

Immunoelectron microscopy
Immunoelectron microscopy Immunoelectron microscopy
Immunoelectron microscopy
 
Flow cytometry and fluorescence activated cell sorting (FACS)
Flow cytometry and fluorescence activated cell sorting (FACS)Flow cytometry and fluorescence activated cell sorting (FACS)
Flow cytometry and fluorescence activated cell sorting (FACS)
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
HELIX-LOOP-HELIX, HELIX-TURN-HELIX
HELIX-LOOP-HELIX, HELIX-TURN-HELIXHELIX-LOOP-HELIX, HELIX-TURN-HELIX
HELIX-LOOP-HELIX, HELIX-TURN-HELIX
 
Proteomics 2 d gel, mass spectrometry, maldi tof
Proteomics 2 d gel, mass spectrometry, maldi tofProteomics 2 d gel, mass spectrometry, maldi tof
Proteomics 2 d gel, mass spectrometry, maldi tof
 
Abzymes
AbzymesAbzymes
Abzymes
 
Linker, Adaptor, Homopolymeric Tailing & Terminal Transferase
Linker, Adaptor, Homopolymeric Tailing & Terminal TransferaseLinker, Adaptor, Homopolymeric Tailing & Terminal Transferase
Linker, Adaptor, Homopolymeric Tailing & Terminal Transferase
 
Polyadenylation
PolyadenylationPolyadenylation
Polyadenylation
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Genomic library construction
Genomic library constructionGenomic library construction
Genomic library construction
 
Rna splicing
Rna splicingRna splicing
Rna splicing
 
Protein array, types and application
Protein array, types and applicationProtein array, types and application
Protein array, types and application
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
DNA Microarray notes.pdf
DNA Microarray notes.pdfDNA Microarray notes.pdf
DNA Microarray notes.pdf
 
DNA protein interaction.pptx
DNA protein interaction.pptxDNA protein interaction.pptx
DNA protein interaction.pptx
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Swiss pdb viewer
Swiss pdb viewerSwiss pdb viewer
Swiss pdb viewer
 

Viewers also liked

La muerte y la tortura no es arte ni cultura
La muerte y la tortura no es arte ni culturaLa muerte y la tortura no es arte ni cultura
La muerte y la tortura no es arte ni cultura
Achaku
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
Jerven Bolleman
 
UniProt & Ontologies
UniProt & OntologiesUniProt & Ontologies
UniProt & Ontologies
Eric Jain
 
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
Zohaib HUSSAIN
 
Advanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osuAdvanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osu
Ben Busby
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database
nadeem akhter
 
Linked Data Management
Linked Data ManagementLinked Data Management
Linked Data Management
Marin Dimitrov
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
EBI
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
Mandy Suzanne
 

Viewers also liked (9)

La muerte y la tortura no es arte ni cultura
La muerte y la tortura no es arte ni culturaLa muerte y la tortura no es arte ni cultura
La muerte y la tortura no es arte ni cultura
 
Semantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQLSemantic Variation Graphs the case for RDF & SPARQL
Semantic Variation Graphs the case for RDF & SPARQL
 
UniProt & Ontologies
UniProt & OntologiesUniProt & Ontologies
UniProt & Ontologies
 
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
ExPASy SIB Bioinformatics Resource Portal CIIT ATD sp13-bty-001
 
Advanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osuAdvanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osu
 
Protein 3D structure and classification database
Protein 3D structure and classification database Protein 3D structure and classification database
Protein 3D structure and classification database
 
Linked Data Management
Linked Data ManagementLinked Data Management
Linked Data Management
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar to UniProt and the Semantic Web

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Stuart Chalk
 
A Reason Able View To The Web Of Pathway Data
A Reason Able View To The Web Of Pathway DataA Reason Able View To The Web Of Pathway Data
A Reason Able View To The Web Of Pathway Data
guest9fc5f3
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
Biogeeks
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical Informatics
Chimezie Ogbuji
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
open_phacts
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-World
Amit Sheth
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
Dr. Naveen Gaurav srivastava
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
Hammad Afzal
 
Open library data and embrace the world library linked data
Open library data and embrace the world library linked dataOpen library data and embrace the world library linked data
Open library data and embrace the world library linked data
皓仁 柯
 
Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)
Sreekanth Gali
 
Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
PUNJAB AGRICULTURAL UNIVERSITY, LUDHIANA, 141004, PUNJAB (INDIA)
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003
robertstevens65
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
nedalalazzwy
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Laurent Alquier
 
DATABASES...............................pptx
DATABASES...............................pptxDATABASES...............................pptx
DATABASES...............................pptx
Cherry
 
BioPortal: ontologies and integrated data resources at the click of a mouse
BioPortal: ontologies and integrated data resourcesat the click of a mouseBioPortal: ontologies and integrated data resourcesat the click of a mouse
BioPortal: ontologies and integrated data resources at the click of a mouse
INRAE (MISTEA) and University of Montpellier (LIRMM)
 
PRO Use Cases for Scientific Communities
PRO Use Cases for Scientific CommunitiesPRO Use Cases for Scientific Communities
PRO Use Cases for Scientific Communities
Paolo Ciccarese
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
Catherine Canevet
 
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONGI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
ENCODE-DCC
 
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Trish Whetzel
 

Similar to UniProt and the Semantic Web (20)

Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
A Reason Able View To The Web Of Pathway Data
A Reason Able View To The Web Of Pathway DataA Reason Able View To The Web Of Pathway Data
A Reason Able View To The Web Of Pathway Data
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical Informatics
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
Semantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-WorldSemantic Web: Technolgies and Applications for Real-World
Semantic Web: Technolgies and Applications for Real-World
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
Open library data and embrace the world library linked data
Open library data and embrace the world library linked dataOpen library data and embrace the world library linked data
Open library data and embrace the world library linked data
 
Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)Pharmacoinformatics Database basics(sree)
Pharmacoinformatics Database basics(sree)
 
Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
DATABASES...............................pptx
DATABASES...............................pptxDATABASES...............................pptx
DATABASES...............................pptx
 
BioPortal: ontologies and integrated data resources at the click of a mouse
BioPortal: ontologies and integrated data resourcesat the click of a mouseBioPortal: ontologies and integrated data resourcesat the click of a mouse
BioPortal: ontologies and integrated data resources at the click of a mouse
 
PRO Use Cases for Scientific Communities
PRO Use Cases for Scientific CommunitiesPRO Use Cases for Scientific Communities
PRO Use Cases for Scientific Communities
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSONGI 2013 - ENCODE Project Data Access via RESTful API and JSON
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
 
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
Using the NCBO Annotator to Develop an Ontology-Based Index of Biomedical Res...
 

More from Chimezie Ogbuji

Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
Chimezie Ogbuji
 
Using OWL for the RESO Data Dictionary
Using OWL for the RESO Data DictionaryUsing OWL for the RESO Data Dictionary
Using OWL for the RESO Data Dictionary
Chimezie Ogbuji
 
Semantic Web use cases in outcomes research
Semantic Web use cases in outcomes researchSemantic Web use cases in outcomes research
Semantic Web use cases in outcomes research
Chimezie Ogbuji
 
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Chimezie Ogbuji
 
Automated clinicalontologyextraction
Automated clinicalontologyextractionAutomated clinicalontologyextraction
Automated clinicalontologyextraction
Chimezie Ogbuji
 
GRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and WhereGRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and Where
Chimezie Ogbuji
 
GRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachGRDDL: A Pictorial Approach
GRDDL: A Pictorial Approach
Chimezie Ogbuji
 
Tools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDLTools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDL
Chimezie Ogbuji
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
Chimezie Ogbuji
 
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsSegmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Chimezie Ogbuji
 
Overview of CPR Ontology
Overview of CPR OntologyOverview of CPR Ontology
Overview of CPR Ontology
Chimezie Ogbuji
 
The Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are ImportantThe Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are Important
Chimezie Ogbuji
 

More from Chimezie Ogbuji (12)

Reference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptxReference Domain Ontologies and Large Medical Language Models.pptx
Reference Domain Ontologies and Large Medical Language Models.pptx
 
Using OWL for the RESO Data Dictionary
Using OWL for the RESO Data DictionaryUsing OWL for the RESO Data Dictionary
Using OWL for the RESO Data Dictionary
 
Semantic Web use cases in outcomes research
Semantic Web use cases in outcomes researchSemantic Web use cases in outcomes research
Semantic Web use cases in outcomes research
 
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
Integrating Large, Disparate, Biomedical Ontologies to Boost Organ Developmen...
 
Automated clinicalontologyextraction
Automated clinicalontologyextractionAutomated clinicalontologyextraction
Automated clinicalontologyextraction
 
GRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and WhereGRDDL: The Why, What, How, and Where
GRDDL: The Why, What, How, and Where
 
GRDDL: A Pictorial Approach
GRDDL: A Pictorial ApproachGRDDL: A Pictorial Approach
GRDDL: A Pictorial Approach
 
Tools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDLTools for Next Generation of CMS: XML, RDF, & GRDDL
Tools for Next Generation of CMS: XML, RDF, & GRDDL
 
Semantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical InformaticsSemantic Web Technologies as a Framework for Clinical Informatics
Semantic Web Technologies as a Framework for Clinical Informatics
 
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical InformaticsSegmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical Informatics
 
Overview of CPR Ontology
Overview of CPR OntologyOverview of CPR Ontology
Overview of CPR Ontology
 
The Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are ImportantThe Characteristics of a RESTful Semantic Web and Why They Are Important
The Characteristics of a RESTful Semantic Web and Why They Are Important
 

Recently uploaded

Skin Diseases That Happen During Summer.
 Skin Diseases That Happen During Summer. Skin Diseases That Happen During Summer.
Skin Diseases That Happen During Summer.
Gokuldas Hospital
 
13. PROM premature rupture of membranes
13.  PROM premature rupture of membranes13.  PROM premature rupture of membranes
13. PROM premature rupture of membranes
TigistuMelak
 
CBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdfCBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdf
suvadeepdas911
 
pharmacology for dummies free pdf download.pdf
pharmacology for dummies free pdf download.pdfpharmacology for dummies free pdf download.pdf
pharmacology for dummies free pdf download.pdf
KerlynIgnacio
 
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
Université de Montréal
 
Ageing, the Elderly, Gerontology and Public Health
Ageing, the Elderly, Gerontology and Public HealthAgeing, the Elderly, Gerontology and Public Health
Ageing, the Elderly, Gerontology and Public Health
phuakl
 
Local anesthetics 2024/ Medicinal Chemistry pdf
Local anesthetics 2024/ Medicinal Chemistry pdfLocal anesthetics 2024/ Medicinal Chemistry pdf
Local anesthetics 2024/ Medicinal Chemistry pdf
NarminHamaaminHussen
 
What is Obesity? How to overcome Obesity?
What is Obesity? How to overcome Obesity?What is Obesity? How to overcome Obesity?
What is Obesity? How to overcome Obesity?
Healthmedsrx.com
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
Dr. Ahana Haroon
 
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptxPost-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
FFragrant
 
Recent advances on Cervical cancer .pptx
Recent advances on Cervical cancer .pptxRecent advances on Cervical cancer .pptx
Recent advances on Cervical cancer .pptx
DrGirishJHoogar
 
Osvaldo Bernardo Muchanga-GASTROINTESTINAL INFECTIONS AND GASTRITIS-2024.pdf
Osvaldo Bernardo Muchanga-GASTROINTESTINAL INFECTIONS AND GASTRITIS-2024.pdfOsvaldo Bernardo Muchanga-GASTROINTESTINAL INFECTIONS AND GASTRITIS-2024.pdf
Osvaldo Bernardo Muchanga-GASTROINTESTINAL INFECTIONS AND GASTRITIS-2024.pdf
Osvaldo Bernardo Muchanga
 
What are the different types of Dental implants.
What are the different types of Dental implants.What are the different types of Dental implants.
What are the different types of Dental implants.
Gokuldas Hospital
 
Acute Gout Care & Urate Lowering Therapy .pdf
Acute Gout Care & Urate Lowering Therapy .pdfAcute Gout Care & Urate Lowering Therapy .pdf
Acute Gout Care & Urate Lowering Therapy .pdf
Jim Jacob Roy
 
SENSORY NEEDS B.SC. NURSING SEMESTER II.
SENSORY NEEDS B.SC. NURSING SEMESTER II.SENSORY NEEDS B.SC. NURSING SEMESTER II.
SENSORY NEEDS B.SC. NURSING SEMESTER II.
KULDEEP VYAS
 
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
MuskanShingari
 
biomechanics of running. Dr.dhwani.pptx
biomechanics of running.   Dr.dhwani.pptxbiomechanics of running.   Dr.dhwani.pptx
biomechanics of running. Dr.dhwani.pptx
Dr. Dhwani kawedia
 
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
FFragrant
 
Call Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Call Girls In Mumbai +91-7426014248 High Profile Call Girl MumbaiCall Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Call Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Mobile Problem
 
June 2024 Oncology Cartoons By Dr Kanhu Charan Patro
June 2024 Oncology Cartoons By Dr Kanhu Charan PatroJune 2024 Oncology Cartoons By Dr Kanhu Charan Patro
June 2024 Oncology Cartoons By Dr Kanhu Charan Patro
Kanhu Charan
 

Recently uploaded (20)

Skin Diseases That Happen During Summer.
 Skin Diseases That Happen During Summer. Skin Diseases That Happen During Summer.
Skin Diseases That Happen During Summer.
 
13. PROM premature rupture of membranes
13.  PROM premature rupture of membranes13.  PROM premature rupture of membranes
13. PROM premature rupture of membranes
 
CBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdfCBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdf
 
pharmacology for dummies free pdf download.pdf
pharmacology for dummies free pdf download.pdfpharmacology for dummies free pdf download.pdf
pharmacology for dummies free pdf download.pdf
 
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
“Psychiatry and the Humanities”: An Innovative Course at the University of Mo...
 
Ageing, the Elderly, Gerontology and Public Health
Ageing, the Elderly, Gerontology and Public HealthAgeing, the Elderly, Gerontology and Public Health
Ageing, the Elderly, Gerontology and Public Health
 
Local anesthetics 2024/ Medicinal Chemistry pdf
Local anesthetics 2024/ Medicinal Chemistry pdfLocal anesthetics 2024/ Medicinal Chemistry pdf
Local anesthetics 2024/ Medicinal Chemistry pdf
 
What is Obesity? How to overcome Obesity?
What is Obesity? How to overcome Obesity?What is Obesity? How to overcome Obesity?
What is Obesity? How to overcome Obesity?
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
 
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptxPost-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
 
Recent advances on Cervical cancer .pptx
Recent advances on Cervical cancer .pptxRecent advances on Cervical cancer .pptx
Recent advances on Cervical cancer .pptx
 
Osvaldo Bernardo Muchanga-GASTROINTESTINAL INFECTIONS AND GASTRITIS-2024.pdf
Osvaldo Bernardo Muchanga-GASTROINTESTINAL INFECTIONS AND GASTRITIS-2024.pdfOsvaldo Bernardo Muchanga-GASTROINTESTINAL INFECTIONS AND GASTRITIS-2024.pdf
Osvaldo Bernardo Muchanga-GASTROINTESTINAL INFECTIONS AND GASTRITIS-2024.pdf
 
What are the different types of Dental implants.
What are the different types of Dental implants.What are the different types of Dental implants.
What are the different types of Dental implants.
 
Acute Gout Care & Urate Lowering Therapy .pdf
Acute Gout Care & Urate Lowering Therapy .pdfAcute Gout Care & Urate Lowering Therapy .pdf
Acute Gout Care & Urate Lowering Therapy .pdf
 
SENSORY NEEDS B.SC. NURSING SEMESTER II.
SENSORY NEEDS B.SC. NURSING SEMESTER II.SENSORY NEEDS B.SC. NURSING SEMESTER II.
SENSORY NEEDS B.SC. NURSING SEMESTER II.
 
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
Computer in pharmaceutical research and development-Mpharm(Pharmaceutics)
 
biomechanics of running. Dr.dhwani.pptx
biomechanics of running.   Dr.dhwani.pptxbiomechanics of running.   Dr.dhwani.pptx
biomechanics of running. Dr.dhwani.pptx
 
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
 
Call Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Call Girls In Mumbai +91-7426014248 High Profile Call Girl MumbaiCall Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
Call Girls In Mumbai +91-7426014248 High Profile Call Girl Mumbai
 
June 2024 Oncology Cartoons By Dr Kanhu Charan Patro
June 2024 Oncology Cartoons By Dr Kanhu Charan PatroJune 2024 Oncology Cartoons By Dr Kanhu Charan Patro
June 2024 Oncology Cartoons By Dr Kanhu Charan Patro
 

UniProt and the Semantic Web

  • 1. UniProt and the Semantic Web Chimezie Ogbuji
  • 2. ‘Omics’ Data Challenges  Advances in protein science is a major catalyst in the exploding availability of bioinformatics data  We have already discussed the dimensions of omics data:  Molecular components, interactions, and phenotype observations  Data from large-scale experiments are no longer published conventionally but stored in a database  Protein sequence databases are one of the most comprehensive information resources for scientists
  • 3. Protein Sequence Databases  Universal protein sequence databases cover all species  Specialized protein databases are particular to a protein family or organism  Sequence repositories  A simple registry of sequence record  No annotations  Curated protein databases  Enrich sequence information with links to various sources (scientific literature primarily)
  • 4. Informatics Challenges  Standard data integration challenge is the lack of common conventions  Applies to not just notation but also to:  Use of identifiers  Representation of cross-references  Framework for defining terms and relationships between them  Links between omics sources is another important component of data integration
  • 5. What is UniProt?  A comprehensive repository of protein sequences and their functional annotations  Curators add value to raw data by annotations against scientific literature  Objective is: the creation and maintenance of stable, comprehensive, and high-quality protein databases, with high level of accessibility, to facilitate cross- database information retrival  Makes use of Semantic Web technologies to address its challenges
  • 6. UniProt: Core Activities  Sequence archiving  Manual (peer-reviewed) and automated curation of sequences  Development of human / machine-readable Uniprot web site  Interaction with other protein-related databases for expanding cross references
  • 7. UniProt: Components  UniProtKB –Protein sequence annotations and metadata:  Protein name, function, taxonomy, enzyme-specific information, domains, sites, subcellular location, interactions, relationships to disease etc.  Links to external sources: DNA sequence repositories, protein structure databases, protein domain and family databases, and species & function-specific data collections  UniRef – Compresses sequences at different resolutions  Parameterized by percent of how identical two sequences or sub-sequences are (100,90,50).  UniParc – Non-redundant database of all publically available protein sequences  Manages globaly-unique identifers, the sequence, information on source database, and CRC check number.
  • 8. Semantic Web Technologies  Set of standards for managing web-based content in a way that emphasizes use by an automaton  Automaton: a machine that performs a function according to a predetermined set of coded instructions  The architectural vision (the Semantic Web) is to extend the standards and best practices behind the World-wide Web with new standards that emphasize meaning over structure of data.  Common data formats  Provide a means to make assertions about the world such that an automaton can reason about it through them  The vision is often confused with the tools meant to achieve it (i.e., set of standards)
  • 9.
  • 10. RDF: Data Model  Standardized format for representating arbitrary information as a labelled, directed graph  Comprised of statements: subject, predicate, object  Terms in statements can be Universal Resource Identifiers (URIs), Blank Nodes (anonymous entities), or Literals  Abstract data model: a labelled, directed graph  Various serializations: XML-based and text-based
  • 12. Modelling vocabulary: RDFS/OWL  RDF Schema (RDFS)  Simple, minimal schema language for RDF  Ontology Web Language (OWL)  Vocabulary for defining classes, relationships, and various constraints that limit how RDF is interpreted  More powerful modeling language  Tools for constraining & defining reality that can be used to codify scientific understanding  Gene Ontology is modelled in this way to capture our understanding of macromolecular reality
  • 13.
  • 14. Query Language: SPARQL  Provides a common graph-matching language for querying RDF data  Similar to SQL in many respects
  • 15. Nature of UniProt Data  Very large number of cross references to external resources  Cross-reference topology that of a graph not a tree  Automated and manual annotation require storage of provenance information (how / when data was acquired)  Requires a framework for both data as well as metadata (data about data)
  • 17. UniProt: Data Conventions  All outbound RDF statements are grouped together (statements about the same subject)  Datasets (nodes in previous graph) are distributed as a single file  Only stores stated data, not entailed data.  For instance, relationships involving symmetric properties are only stored in one direction
  • 18.
  • 19. UniProt: Naming Conventions  Generally, in semiotics: a symbol denotes a referent.  In Web architecture, URIs identify resources  URIs that can be resolved over the web are URLs  UniProt URIs identify:  Resources that correspond to database entries  Modeling vocabulary that use standard namespaces: RDFS and OWL  Classes and properties used by UniProt  For ex: http://purl.uniprot.org/core/Gene  Resources without stable identifiers (from their source)
  • 20. The Omics Identification Problem  UniProt uses a templated naming convention:  http://purl.uniprot.org/{database}/{identifier}  http://purl.uniprot.org/uniprot/{protein_identifier}  Problem  http://purl.uniprot.org/uniprot/P04926 denotes the Malaria protein EX-1  If loading that address in a browser returns a web page, can an automaton infer that Malaria protein EX-1 is a web page?  How do you identify abstract concepts v.s. digital media
  • 21. The PURL Solution  Persistent Uniform Resource Locator (PURL) is a public URI management service for allocating a ‘URI space’ as a mapping of identifiers (aliases) for resources they are not immediately responsible for  PURLs are web addresses that act as permanent identifiers in the face of a dynamic and changing Web infrastructure  A request to a PURL returns a 303 HTTP status code and a location:  303 indicates that a response can be found under the returned location
  • 22. The PURL Solution: Continued  Can use PURL addresses to identify abstract concepts  Redirect requests to such addresses to an informative web page (for humans) with a means for machines to extract other formats  RDF statements are about proteins, machines can reasons about proteins, and humans resolve protein identifiers to view informative web pages
  • 23.  RDF/XML link:  http://www.uniprot.org/uniprot/P04926.rdf
  • 26. Serendipitous Re-use  Having a rich repository of protein sequence metadata, annotations, and taxonomic classification in a distributed, standard format encourages scientific collaboration
  • 27. General UniProt Re-Use Scenario  User A refers to protein P1 in their dataset  User A’s dataset doesn’t include statements about P1 (the host organism for instance)  User B comes across this dataset and (in order to find out more about protein P1) puts the URI of protein P1 in their browser and pulls up human-readable information about it (including the host organism)  Automaton C comes across the same dataset, fetches the web page, fetches the RDF about P1 and has access to the same information as user B and can reason about the major taxon the host organism belongs to
  • 28. References  Wu, C. et.al.,”The Universal Protein Resource (UniProt): an expanding universe of protein information”. Nucleic Acids Research, vol. 34. 2006  Swiss Institute of Bioinformatics, “UniProt RDF (project page)”. http://dev.isb-sib.ch/projects/uniprot-rdf/  Redaschi, N. and UniProt Consortium, “UniProt in RDF: Tackling Data Integration and Distributed Annotation” Nature Proceedings, 3rd International Biocuration Conference, April 2009. http://precedings.nature.com/documents/3193/version/1