Building a Distributed Data Portal

942 views

Published on

Slides from a presentation I gave at SciBarCamb 2011 (9th April, 2011) in Cambridge (UK).

Basically it goes through some of the recent work and theory i've been doing to do with setting up a data portal using distributed web services, allowing easy data sharing and reduced effort in data maintenance.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
942
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Building a Distributed Data Portal

  1. 1. Building a Distributed Data Portal Darren Oakley SciBarCamb 2011
  2. 2. Background•mouse informatics @ Sanger Institute•work with lots of other groups •need to share, integrate and represent lots of datatypes •both OUR and OTHER peoples data
  3. 3. •gene information (id‘s, location, GO etc)•related human diseases (OMIM, GWAS)•expression•phenotyping•mutant mouse breeding•mutant es cells, vectors
  4. 4. that‘s a lot of stuff...
  5. 5. we can do this one of two ways...
  6. 6. ‘Borg‘ Approach • single group becomes sole owner/curator of portal and its data • other groups feed their data into portal group
  7. 7. burp
  8. 8. Pros•clearly defined centre to the universe•provides central curation to all data
  9. 9. Cons•huge effort to curate and maintain large and diverse dataset •hold / maintain your own db of everything •integrating totally new / different data becomes a challenge•single group becomes effective ‘owner‘•can stifle innovation and new ideas
  10. 10. what happens whenmore than one group tries to do this?
  11. 11. “Hand over your data,prepare to be assimilated” “No, YOU hand over your data and prepare to be assimilated” “Ahem, both of you, prepare to be assimilated!”
  12. 12. “Hand over your data,prepare to be assimilated” “No, YOU hand over your data and prepare to be assimilated” g? l Bor e rea u is th o ch of y … whi “Ahem, both of you, prepare to be assimilated!”
  13. 13. ‘Federation‘ Approach • each group hosts their own data and exposes it via defined services • make a ‘clever‘ portal that pulls these resources together • no single group is totally in charge
  14. 14. Use data for a more specialized purposeBuild own portal competitor
  15. 15. The Techsearch engine data sources web service
  16. 16. MartSearch / Portal
  17. 17. MartSearch / Portal
  18. 18. MartSearch / Portalindex searchable terms
  19. 19. MartSearch / Portalindex searchable terms
  20. 20. MartSearch / Portalindex searchable terms
  21. 21. MartSearch / Portal send users search term to Solrindex searchable terms
  22. 22. MartSearch / Portal send users search term to Solr Solr returns groups of terms to query data sources withindex searchable terms
  23. 23. MartSearch / Portal send users search term to Solr Solr returns groups of terms to query data sources with send asynchronous requests to each of the data sources for the data the user is interested inindex searchable terms
  24. 24. User searches for ‘diabetes‘
  25. 25. User searches for ‘diabetes‘ Search for ‘diabetes‘
  26. 26. User searches for ‘diabetes‘ Search for ‘diabetes‘ JSON data containing information on what to search each datasource by...
  27. 27. User searches for ‘diabetes‘ Search for ‘diabetes‘ JSON data containing information on what to search each datasource by... Search using query parameters defined by Solr response
  28. 28. User searches for ‘diabetes‘ Search for ‘diabetes‘ JSON data containing information on what to search each datasource by... Search using query parameters defined by Solr response Render search results using templates
  29. 29. Pros•easily extendable•data curation done by primary data producers / handlers•YOU don‘t have to keep / maintain copies of everything
  30. 30. Cons•hard to avoid some data redundancy •need common linking terms•un-curated as a whole
  31. 31. Extending the Portal•set-up or find a new datasource to add •other web service •another biomart•write a simple config/adaptor to talk to it
  32. 32. www.knockoutmouse.org/martsearchgithub.com/i-dcc/martsearch@dazoakley

×