• Save
Building a Distributed Data Portal
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Building a Distributed Data Portal

on

  • 577 views

Slides from a presentation I gave at SciBarCamb 2011 (9th April, 2011) in Cambridge (UK). ...

Slides from a presentation I gave at SciBarCamb 2011 (9th April, 2011) in Cambridge (UK).

Basically it goes through some of the recent work and theory i've been doing to do with setting up a data portal using distributed web services, allowing easy data sharing and reduced effort in data maintenance.

Statistics

Views

Total Views
577
Views on SlideShare
550
Embed Views
27

Actions

Likes
1
Downloads
0
Comments
0

3 Embeds 27

http://lanyrd.com 18
http://coderwall.com 8
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Building a Distributed Data Portal Presentation Transcript

  • 1. Building a Distributed Data Portal Darren Oakley SciBarCamb 2011
  • 2. Background•mouse informatics @ Sanger Institute•work with lots of other groups •need to share, integrate and represent lots of datatypes •both OUR and OTHER peoples data
  • 3. •gene information (id‘s, location, GO etc)•related human diseases (OMIM, GWAS)•expression•phenotyping•mutant mouse breeding•mutant es cells, vectors
  • 4. that‘s a lot of stuff...
  • 5. we can do this one of two ways...
  • 6. ‘Borg‘ Approach • single group becomes sole owner/curator of portal and its data • other groups feed their data into portal group
  • 7. burp
  • 8. Pros•clearly defined centre to the universe•provides central curation to all data
  • 9. Cons•huge effort to curate and maintain large and diverse dataset •hold / maintain your own db of everything •integrating totally new / different data becomes a challenge•single group becomes effective ‘owner‘•can stifle innovation and new ideas
  • 10. what happens whenmore than one group tries to do this?
  • 11. “Hand over your data,prepare to be assimilated” “No, YOU hand over your data and prepare to be assimilated” “Ahem, both of you, prepare to be assimilated!”
  • 12. “Hand over your data,prepare to be assimilated” “No, YOU hand over your data and prepare to be assimilated” g? l Bor e rea u is th o ch of y … whi “Ahem, both of you, prepare to be assimilated!”
  • 13. ‘Federation‘ Approach • each group hosts their own data and exposes it via defined services • make a ‘clever‘ portal that pulls these resources together • no single group is totally in charge
  • 14. Use data for a more specialized purposeBuild own portal competitor
  • 15. The Techsearch engine data sources web service
  • 16. MartSearch / Portal
  • 17. MartSearch / Portal
  • 18. MartSearch / Portalindex searchable terms
  • 19. MartSearch / Portalindex searchable terms
  • 20. MartSearch / Portalindex searchable terms
  • 21. MartSearch / Portal send users search term to Solrindex searchable terms
  • 22. MartSearch / Portal send users search term to Solr Solr returns groups of terms to query data sources withindex searchable terms
  • 23. MartSearch / Portal send users search term to Solr Solr returns groups of terms to query data sources with send asynchronous requests to each of the data sources for the data the user is interested inindex searchable terms
  • 24. User searches for ‘diabetes‘
  • 25. User searches for ‘diabetes‘ Search for ‘diabetes‘
  • 26. User searches for ‘diabetes‘ Search for ‘diabetes‘ JSON data containing information on what to search each datasource by...
  • 27. User searches for ‘diabetes‘ Search for ‘diabetes‘ JSON data containing information on what to search each datasource by... Search using query parameters defined by Solr response
  • 28. User searches for ‘diabetes‘ Search for ‘diabetes‘ JSON data containing information on what to search each datasource by... Search using query parameters defined by Solr response Render search results using templates
  • 29. Pros•easily extendable•data curation done by primary data producers / handlers•YOU don‘t have to keep / maintain copies of everything
  • 30. Cons•hard to avoid some data redundancy •need common linking terms•un-curated as a whole
  • 31. Extending the Portal•set-up or find a new datasource to add •other web service •another biomart•write a simple config/adaptor to talk to it
  • 32. www.knockoutmouse.org/martsearchgithub.com/i-dcc/martsearch@dazoakley