Sanger Mouse
 Resources Portal
A Testbed for Collaborative Data Integration


         Darren Oakley, Vivek Iyer, Bill Skarnes
Making a
Collaborative Data
     Portal...
‘Borg’ Approach

         •   Single group becomes
             sole owner/curator of
             portal and its data

         •   Other groups feed
             their data into portal
             group
burp
Why This Works


•   Clearly defined centre

•   It provides central curation for all data
Mouse Informatics

    •   Genes

        •   Mutants (ES Cells, Mice)

            •   Phenotypes

•   In mouse informatics, the traditional
    Borg is MGI - this has worked nicely
    for many years: http://informatics.jax.org
Mouse Informatics

•   Times are changing...

    •   Other informatics groups are providing
        high volume data and want in on the
        portal game
“Hand over your data,
prepare to be assimilated”




                                                                “No, YOU hand over your data and
                                                                   prepare to be assimilated”




                         “Ahem, both of you, prepare to be assimilated!”
“Hand over your data,
prepare to be assimilated”




                                                                                 “No, YOU hand over your data and
                                                                                    prepare to be assimilated”




                                              ?
                                       lB org
                                 he rea
                             t
                   yo u is
          c h of
  …   whi
                                          “Ahem, both of you, prepare to be assimilated!”
‘Federation’ Approach
            •   Each group hosts
                their own data and
                exposes it via defined
                services

            •   Make a ‘clever’ portal
                that pulls of these
                resources together

            •   No single group is
                totally in charge
The Sanger Mouse
 Resources Portal
 http://www.sanger.ac.uk/mouseportal

(Our Attempt at the Federation Approach...)
Distributed Data
•   Currently 5 distinct, but related sets of
    mouse data:

    •   Gene Information

    •   Phenotyping

    •   Mutant Mouse Breeding

    •   Mutant ES Cell / Vector Production

    •   Other DNA Resources
Screenshot Tour
Technologies
Search Engine
                Portal Interface



Data Services
index searchable
     terms
index searchable
     terms
MartSearch / Portal




index searchable
     terms
MartSearch / Portal


send users search term to Solr




index searchable
     terms
MartSearch / Portal


send users search term to Solr

     Solr returns groups of terms
    to query Biomarts with




index searchable
     terms
MartSearch / Portal


send users search term to Solr

     Solr returns groups of terms
    to query Biomarts with
                                     send asynchronous requests to each of the
                                    Biomarts for the data the user is interested in




index searchable
     terms
User searches for ‘Cbx7’
User searches for ‘Cbx7’


     Search for ‘Cbx7’
User searches for ‘Cbx7’


     Search for ‘Cbx7’
     JSON data containing information on
     what to search each biomart by...
User searches for ‘Cbx7’


     Search for ‘Cbx7’
     JSON data containing information on
     what to search each biomart by...


     Search using query parameters
     defined by Solr response
User searches for ‘Cbx7’


     Search for ‘Cbx7’
     JSON data containing information on
     what to search each biomart by...


     Search using query parameters
     defined by Solr response

     Render search results using templates
Extending The Portal

•   Put new data into a Biomart

•   Write JSON config file for MartSearch
    (defining filters to index and use)

•   Rebuild the index
Advantages


•   Easily extensible

•   Data responsibility shared
Disadvantages

•   Hard to avoid redundancy

    •   Sometimes needed for data linking

•   Un-curated

    •   Each group can curate its own data

    •   No curation as a whole
Disclaimer
•   Windows users...

    •   If you use IE - it will eat your browser

    •   Use Firefox/Chrome/Safari/Opera for
        a more pleasant internet experience

    •   We are working on it - IE 8 gives an ok
        experience...
Acknowledgments
•   Funding: I-DCC grant (EU FP7)

    •   Coordination of informatic resources
        from high-throughput mouse ES cell
        mutagensis programs

•   Wellcome Trust Sanger Institute

    •   T87 - ES Cell Mutagenesis

    •   MIG - Mouse Informatics Group
http://www.sanger.ac.uk/mouseportal

http://github.com/dazoakley/martsearch

do2@sanger.ac.uk

dazoakley

The Sanger Mouse Resources Portal - A Testbed for Collaborative Data Integration

  • 1.
    Sanger Mouse ResourcesPortal A Testbed for Collaborative Data Integration Darren Oakley, Vivek Iyer, Bill Skarnes
  • 2.
  • 3.
    ‘Borg’ Approach • Single group becomes sole owner/curator of portal and its data • Other groups feed their data into portal group
  • 6.
  • 7.
    Why This Works • Clearly defined centre • It provides central curation for all data
  • 8.
    Mouse Informatics • Genes • Mutants (ES Cells, Mice) • Phenotypes • In mouse informatics, the traditional Borg is MGI - this has worked nicely for many years: http://informatics.jax.org
  • 9.
    Mouse Informatics • Times are changing... • Other informatics groups are providing high volume data and want in on the portal game
  • 11.
    “Hand over yourdata, prepare to be assimilated” “No, YOU hand over your data and prepare to be assimilated” “Ahem, both of you, prepare to be assimilated!”
  • 12.
    “Hand over yourdata, prepare to be assimilated” “No, YOU hand over your data and prepare to be assimilated” ? lB org he rea t yo u is c h of … whi “Ahem, both of you, prepare to be assimilated!”
  • 13.
    ‘Federation’ Approach • Each group hosts their own data and exposes it via defined services • Make a ‘clever’ portal that pulls of these resources together • No single group is totally in charge
  • 21.
    The Sanger Mouse Resources Portal http://www.sanger.ac.uk/mouseportal (Our Attempt at the Federation Approach...)
  • 23.
    Distributed Data • Currently 5 distinct, but related sets of mouse data: • Gene Information • Phenotyping • Mutant Mouse Breeding • Mutant ES Cell / Vector Production • Other DNA Resources
  • 24.
  • 32.
    Technologies Search Engine Portal Interface Data Services
  • 35.
  • 36.
  • 37.
    MartSearch / Portal indexsearchable terms
  • 38.
    MartSearch / Portal sendusers search term to Solr index searchable terms
  • 39.
    MartSearch / Portal sendusers search term to Solr Solr returns groups of terms to query Biomarts with index searchable terms
  • 40.
    MartSearch / Portal sendusers search term to Solr Solr returns groups of terms to query Biomarts with send asynchronous requests to each of the Biomarts for the data the user is interested in index searchable terms
  • 42.
  • 43.
    User searches for‘Cbx7’ Search for ‘Cbx7’
  • 44.
    User searches for‘Cbx7’ Search for ‘Cbx7’ JSON data containing information on what to search each biomart by...
  • 45.
    User searches for‘Cbx7’ Search for ‘Cbx7’ JSON data containing information on what to search each biomart by... Search using query parameters defined by Solr response
  • 46.
    User searches for‘Cbx7’ Search for ‘Cbx7’ JSON data containing information on what to search each biomart by... Search using query parameters defined by Solr response Render search results using templates
  • 47.
    Extending The Portal • Put new data into a Biomart • Write JSON config file for MartSearch (defining filters to index and use) • Rebuild the index
  • 48.
    Advantages • Easily extensible • Data responsibility shared
  • 49.
    Disadvantages • Hard to avoid redundancy • Sometimes needed for data linking • Un-curated • Each group can curate its own data • No curation as a whole
  • 50.
    Disclaimer • Windows users... • If you use IE - it will eat your browser • Use Firefox/Chrome/Safari/Opera for a more pleasant internet experience • We are working on it - IE 8 gives an ok experience...
  • 51.
    Acknowledgments • Funding: I-DCC grant (EU FP7) • Coordination of informatic resources from high-throughput mouse ES cell mutagensis programs • Wellcome Trust Sanger Institute • T87 - ES Cell Mutagenesis • MIG - Mouse Informatics Group
  • 52.