Semantic web technologies applied to bioinformatics and laboratory data management
Upcoming SlideShare
Loading in...5
×
 

Semantic web technologies applied to bioinformatics and laboratory data management

on

  • 8,637 views

Presentation held at PRBB Computational Genomics Seminars at March 18th.

Presentation held at PRBB Computational Genomics Seminars at March 18th.

Statistics

Views

Total Views
8,637
Views on SlideShare
8,552
Embed Views
85

Actions

Likes
0
Downloads
58
Comments
0

6 Embeds 85

http://biocore.crg.cat 48
http://www.slideshare.net 22
http://www.brijj.com 8
http://biocore.crg.es 4
http://www.lmodules.com 2
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Semantic web technologies applied to bioinformatics and laboratory data management Semantic web technologies applied to bioinformatics and laboratory data management Presentation Transcript

  • Semantic web technologies applied to bioinformatics and laboratory data management
      Toni Hermoso Pulido
    • [email_address]
    • Bioinformatics Core Facility
    • http://biocore.crg.cat
    • THE CLASSICAL WEB
      > Syntax
      • Markup languages (HTML, XHTML, etc.)
      > Content
      • Text inside the tags (or as attributes)
      > Style
      • HTML tags themselves
      • CSS (in content or as external files)
    Robert Cailliau WWW fomer logo Tim Berners-Lee, Robert Cailiau. CERN (1990)
    • THE CLASSICAL WEB
    • WEB 2.0
    > Buzz word. First coinage associated to Tim O'Reilly. > The term "Web 2.0" (2004–present) is commonly associated with web applications that facilitate interactive information sharing, interoperability, user-centered design, and collaboration on the World Wide Web. > Examples of Web 2.0 include web-based communities, hosted services, web applications, social-networking sites, video-sharing sites, wikis, blogs, mashups, and folksonomies.
      > AJAX, RSS, Web APIs…
    • wikis may allow anyone to edit
    • wikis are intended to be easy to use
    • wiki content is easy to link
    • wikis support tracking of all changes
    • wikis may allow upload different media
    Wiki – Wiki ! WikiWikiWeb. Ward Cunningham - 1994
    • MediaWiki
    > Most popular wiki software > Behind Wikimedia Foundation. > The most know implementation is: Wikipedia http://www.wikipedia.org First version 2002. Wikipedia before UseModWiki (Perl Wiki)
    • Gene Wiki: Gene annotation project in Wikipedia
    http://en.wikipedia.org/wiki/Portal:Gene_Wiki > Approach rellevant human genes information to end-users > Manual collaborative annotation & automated external reference thanks to robot software
      > Wikipedia portal within Molecular and Cellular Biology Project
    Published September 2009
    • Gene Wiki: Gene annotation project in Wikipedia
    • GENE WIKI
      > Example of a wiki page
    • Reelin
    • GENE WIKI
      > Example of a wiki category page
    • Human proteins
    • GENE WIKI
      > Example of a wiki source page:
    • Reelin
    • GENEWIKI
      > Example of a wiki template page:
    • Reelin
    • Web parsing / scraping
    > To get information from a HTML source (wiki included) Download tools:
    • Lynx
    • Wget
    • Perl LWP
    • Perl WWW::Mechanize
    • Python Beautiful Soap…
    • Web parsing / scraping
    > Processing content. (example, EC: 3.4.21.-)
    • Regular expressions
      • s/<a href=&quot;http://www.genome.jp/dbget-bin/www_bget?enzyme+(S+)?&quot;/g
    • Xpath
      • id('bodyContent')/x:table[2]/x:tbody/x:tr[8]/x:td/x:span/x:a
    • MediaWiki API
    > http://en.wikipedia.org/w/api.php
    • MediaWiki API
    > Common scripting with Python or Perl: MediaWiki::Bot > You can get / store information from/in wiki.
    • MediaWiki API
    > Easier to extract data:
      • Retrieve wiki syntax, not direct HTML content
      • Useful when templates are used
      • Can retrieve all pages from a category
    • SEMANTIC WEB
    > The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in collaboration. Sir Tim Berners-Lee
  • The Berners-Lee Semantic Web ‘Birthday Cake’ http://www.mkbergman.com/231/from-data-federation-pyramid-to-the-semantic-web-birthday-cake/
  • The evolution of the Web
    • SNPedia: a semantic wiki for human genetic studies
    > http://www.snpedia.com (starts from 2007) > Semantic MediaWiki (first releases 2005) > Database of SNP (Single Nucleotide Polymorphisms) > In September 2009, website claimed 7,938 SNPs in their database. > Predictive medicine report against SNPedia using Promethease: An application to query SNPedia against your genotyping
    • SNPedia: a semantic wiki for human genetic studies
    • SNPedia
      > Example of a wiki page
    • Rs333
    • SNPedia
      > Example of a wiki page
    • Rs333
    • SNPedia
      > Example of a wiki page properties
    • Rs333
    • SNPedia
      > Example of a page property ( disease ) value
    • HIV
    • Semantic MediaWiki Data Types
    * Type:Page : links to pages (the default) * Type:String : text strings that are not longer than 250 letters * Type:Number : integer and decimal numbers with optional exponent * Type:Boolean : restricts the value of a property to true/false (also 1/0 and yes/no) * Type:Date: specifies particular points in time * Type:Text: like Type:String but can have unlimited length; the trade-off is values of this type cannot be selection or sort criteria in queries. * Type:Code: like Type:Text but with additional precautions to preserve special formatting as used for technical texts. The value displays as regular text everywhere else (query results, factbox, &quot;Pages using the property&quot;, etc.). * Type:Temperature: variation of Type:Number that supports uits of temperature (cannot be user-defined since converting temperature units is more complicated than multiplying by a conversion factor). * Type:Telephone number: validates and stores international telephone numbers based on the RFC 3966 standard * Type:Record: type for compound property values that consists of a short list of values with fixed type and order
    • Semantic MediaWiki Data Types
    For specifying URLs and emails, there are some special variations of the string data type: * Type:URL: displays an external link to its URL object * Type:Email: displays an e-mail address as a link (with mailto:) * Type:Annotation URI: similar to Type:URL but with some technical differences in SMW's RDF export Some extension provide further types: * Type:Geographic coordinate (provided by Semantic Maps ): describes geographic locations. Different forms of geographic coordinates are supported. http://semantic-mediawiki.org/wiki/Help:Properties_and_types
    • SNPedia – RDF behind a wiki page
    • RDF (Resource Description Framework)
    Triple {subject, property/predicate, object} Defining & describing data and relations among data Suitable to attach metadata to certain resources Understood by machines (not so much by humans…) Normally in XML format Alternative: RDFa (in XHTML pages directly)
    • RDF: Gene Ontology
  • OWL: Gene Ontology
    • SPARQL
    RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language .
    • Example query of Wikipedia: http://dbpedia.org/sparql
    • Example query of biological resources:
    • http://www.semantic-systems-biology.org/biogateway/querying
    • SPARQL
    • SPARQL
    • Semantic MediaWiki vs MediaWiki (I)
    Semantic MediaWiki (and other semantic addons) is an extension of MediaWiki. At least as much as with MediaWiki. Better and more specific search capabilities Not only free text search on pages It resembles relational database searching SPARQL =~ SQL
    • Semantic MediaWiki vs MediaWiki (II)
    Better browsing interface (browsing through properties, not only categories) Importing and exporting of logical mesh. Easier exchange of information with 3 rd party applications (through RDF)
  • Protein-Wiki Semantic wiki-based system for the management of a protein production service. Currently in testing phase In collaboration with CRG Protein Service Customisation built up after study of their present workflow and actual needs. Intended for internal use
    • Protein-Wiki . Advantages
    > Cheaper approach than most commercial similar solutions > Open-source technology. Blooming comunity behind.
      > Avoidance of vendor lock-in and abusive licensing.
      > Customisable to specific needs. Extrapolable to other cases.
    • Protein-Wiki . Example Workflow
        Create study
        Accept
        Lab Member
        Researcher
        Access web interface
        Fill form
        Submit request
        Reject
        Review scientific info
        Review study
        Accept
        Reject
        Reject
        Lab Manager
        Assign study to core members
        Finance Controller (ORDER MANAGEMENT SYSTEM)
        Review financial Info?
        Accept
        Open study
        Retrieve SOP
        Perform all study steps
        ( quotation )
        Review study results
        Reject
        Sign-off (?)
        Request review
        Prepare report
        ( order number )
        Send results/report
        ( communication )
        //
        Retrieve results/report
        Sign-off (?)
        Accept
        Meeting
        Meeting
        Meeting
        Meeting
        Meeting
        Meeting
        // ?
        //
        // ?
        Receive invoice
      Design: Guglielmo Roma
    • Protein-Wiki : Users roles S ubmit requests to the service using pre-defined templates, view the status of his/her requests at any time, and retrieve the study reports when experiments are complete Can add, edit experimental data, cannot create or delete experiments. Can create, edit, delete new experiments, associated to submitted requests, using pre-defined templates C reation of new templates, users management and their training
    • Protein-Wiki : permissions & security Login & role permissions. Done automatically or via administrator Namespaces specific permissions: Experiment:: (only lab members/managers) Template:: (only administrators) Page specific permissions By using user and parse functions extensions Network?
    • Protein-Wiki Homepage
    • Protein-Wiki . Request Form
    • Protein-Wiki . Request Form
    • Protein-Wiki . Request result page
    • Protein-Wiki . Enable experiment
    • Protein-Wiki . Experiment form
    • Protein-Wiki. Experiment form
    • Protein-Wiki . Experiment form Logical input Restrictions. Data Type linked
      • Protein-Wiki
      • Experiment page
      • Protein-Wiki
      • Browse experiment properties
    • Protein-Wiki . Semantic properties ￧ Allowed values Invalid value
      • Protein-Wiki . Conditional syntax
      Enable certain experiment sections if asked by the researcher or lab manager Input value restriction at the form level Example: Only nucleotides allowed in Primer sequences
      • Protein-Wiki . List of tasks
      • May be visible or not to researchers. Workload.
      Different fields depending on the user's role.
      • Protein-Wiki . List of tasks
      Any kind of customised list can be created from semantic properties.
      • Conclusions (I)
      Semantic MediaWiki (and other MediaWiki extensions) in lab workflow environments Efficient collaboration between different users Group roles specific permissions Researchers , lab members, lab managers, administrators Well-know interface. All people should have edited Wikipedia once! Note-taking in wiki for future consultation
      • Conclusions (II)
      Semantic MediaWiki (and other MediaWiki extensions) in lab workflow environments Users can be both humans and robot script applications Refined and specific queries Logic connection with other semantic empowered software Easy set up of new environments (high level programming)
      • Wiki templates, properties and forms vs coding and database design
      • Conclusions (III)
      Semantic MediaWiki (and other MediaWiki extensions) in lab workflow environments Tracking (page history and recent changes) Unless performed by the wiki administrator, workflow cannot be avoided Unless performed by the system administrator, history cannot be forged. Permits 3rd party quality check auditing
    • Bioinformatics Unit
      • Guglielmo Roma
      • Luca Cozzuto
      • Francesco Mancuso
      Acknowledgments Protein Service
      • Michela Bertero
      • Silvia Speroni
      • Miriam Alloza