SlideShare a Scribd company logo
Building a Distributed
     Data Portal
       Darren Oakley

      SciBarCamb 2011
Background

•mouse informatics @ Sanger Institute
•work with lots of other groups
 •need to share, integrate and represent
    lots of datatypes
 •both OUR and OTHER peoples data
•gene information (id‘s, location, GO etc)
•related human diseases (OMIM, GWAS)
•expression
•phenotyping
•mutant mouse breeding
•mutant es cells, vectors
that‘s a lot of stuff...
we can do this one of
     two ways...
‘Borg‘ Approach

         • single group becomes
           sole owner/curator
           of portal and its data

         • other groups feed
           their data into portal
           group
burp
Pros


•clearly defined centre to the universe
•provides central curation to all data
Cons
•huge effort to curate and maintain large
  and diverse dataset
 •hold / maintain your own db of
    everything
 •integrating totally new / different data
    becomes a challenge
•single group becomes effective ‘owner‘
•can stifle innovation and new ideas
what happens when
more than one group
  tries to do this?
“Hand over your data,
prepare to be assimilated”




                                                              “No, YOU hand over your data and
                                                                 prepare to be assimilated”




                        “Ahem, both of you, prepare to be assimilated!”
“Hand over your data,
prepare to be assimilated”




                                                                         “No, YOU hand over your data and
                                                                            prepare to be assimilated”




                                         g?
                                   l Bor
                           e rea
                 u is th
              o
       ch of y
  … whi
                                   “Ahem, both of you, prepare to be assimilated!”
‘Federation‘ Approach
            • each group hosts
              their own data and
              exposes it via defined
              services

            • make a ‘clever‘ portal
              that pulls these
              resources together

            • no single group is
              totally in charge
Use data for a more
            specialized purpose




Build own portal
  competitor
The Tech

search engine   data sources

                web service
MartSearch / Portal
MartSearch / Portal
MartSearch / Portal




index searchable
     terms
MartSearch / Portal




index searchable
     terms
MartSearch / Portal




index searchable
     terms
MartSearch / Portal


 send users search term to Solr




index searchable
     terms
MartSearch / Portal


 send users search term to Solr

     Solr returns groups of terms
    to query data sources with




index searchable
     terms
MartSearch / Portal


 send users search term to Solr

     Solr returns groups of terms
    to query data sources with
                                 send asynchronous requests to each of the
                              data sources for the data the user is interested in




index searchable
     terms
User searches for ‘diabetes‘
User searches for ‘diabetes‘


       Search for ‘diabetes‘
User searches for ‘diabetes‘


       Search for ‘diabetes‘
       JSON data containing information on
       what to search each datasource by...
User searches for ‘diabetes‘


       Search for ‘diabetes‘
       JSON data containing information on
       what to search each datasource by...


        Search using query parameters
        defined by Solr response
User searches for ‘diabetes‘


       Search for ‘diabetes‘
       JSON data containing information on
       what to search each datasource by...


        Search using query parameters
        defined by Solr response

       Render search results using templates
Pros

•easily extendable
•data curation done by primary data
  producers / handlers
•YOU don‘t have to keep / maintain copies
  of everything
Cons


•hard to avoid some data redundancy
 •need common linking terms
•un-curated as a whole
Extending the Portal

•set-up or find a new datasource to add
 •other web service
 •another biomart
•write a simple config/adaptor to talk to it
www.knockoutmouse.org/martsearch

github.com/i-dcc/martsearch

@dazoakley

More Related Content

What's hot

NISO Webinar: Authority Control: Are You Who We Say You Are?
NISO Webinar:  Authority Control: Are You Who We Say You Are?NISO Webinar:  Authority Control: Are You Who We Say You Are?
NISO Webinar: Authority Control: Are You Who We Say You Are?
National Information Standards Organization (NISO)
 
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...
ASIS&T
 

What's hot (20)

Introduction to Crossref, Seoul - Ed Pentz
Introduction to Crossref, Seoul - Ed PentzIntroduction to Crossref, Seoul - Ed Pentz
Introduction to Crossref, Seoul - Ed Pentz
 
Meadows apr28-1
Meadows apr28-1Meadows apr28-1
Meadows apr28-1
 
LIBRIS - Linked Library Data
LIBRIS - Linked Library DataLIBRIS - Linked Library Data
LIBRIS - Linked Library Data
 
It19 20140721 linked data personal perspective
It19 20140721 linked data personal perspectiveIt19 20140721 linked data personal perspective
It19 20140721 linked data personal perspective
 
Ziegler Open Data in Special Collections Libraries
Ziegler Open Data in Special Collections LibrariesZiegler Open Data in Special Collections Libraries
Ziegler Open Data in Special Collections Libraries
 
Semantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital LibrariesSemantic Linking & Retrieval for Digital Libraries
Semantic Linking & Retrieval for Digital Libraries
 
ORCID: An Overview - Alice Meadows
ORCID: An Overview - Alice MeadowsORCID: An Overview - Alice Meadows
ORCID: An Overview - Alice Meadows
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
NISO Webinar: Authority Control: Are You Who We Say You Are?
NISO Webinar:  Authority Control: Are You Who We Say You Are?NISO Webinar:  Authority Control: Are You Who We Say You Are?
NISO Webinar: Authority Control: Are You Who We Say You Are?
 
From the principle of sufficiency and necessity to metadata enriching
From the principle of sufficiency and necessity to metadata enrichingFrom the principle of sufficiency and necessity to metadata enriching
From the principle of sufficiency and necessity to metadata enriching
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities
 
Lifting the Lid on Linked Data
Lifting the Lid on Linked DataLifting the Lid on Linked Data
Lifting the Lid on Linked Data
 
Dataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standardDataset description using the W3C HCLS standard
Dataset description using the W3C HCLS standard
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Erdmann apr28-2
Erdmann apr28-2Erdmann apr28-2
Erdmann apr28-2
 
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...
 
No Sql Movement
No Sql MovementNo Sql Movement
No Sql Movement
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 
Campaign for Richer Metadata
Campaign for Richer MetadataCampaign for Richer Metadata
Campaign for Richer Metadata
 
"Cool" metadata for FAIR data
"Cool" metadata for FAIR data"Cool" metadata for FAIR data
"Cool" metadata for FAIR data
 

Viewers also liked

rajco grupe (I) art of interior's science
rajco grupe (I) art of interior's sciencerajco grupe (I) art of interior's science
rajco grupe (I) art of interior's science
Laxmikant Sharma
 
El cargol 1
El cargol 1El cargol 1
El cargol 1
evives1
 
Desahogo por culpa del sexo
Desahogo por culpa del sexoDesahogo por culpa del sexo
Desahogo por culpa del sexo
Glendaly Nieves
 

Viewers also liked (15)

rajco grupe (I) art of interior's science
rajco grupe (I) art of interior's sciencerajco grupe (I) art of interior's science
rajco grupe (I) art of interior's science
 
Afiq Kuala Lumpur book
Afiq Kuala Lumpur bookAfiq Kuala Lumpur book
Afiq Kuala Lumpur book
 
Eli marrison
Eli marrisonEli marrison
Eli marrison
 
Golpe de Estádio - Falcatruas
Golpe de Estádio - FalcatruasGolpe de Estádio - Falcatruas
Golpe de Estádio - Falcatruas
 
El cargol 1
El cargol 1El cargol 1
El cargol 1
 
Porale
PoralePorale
Porale
 
deepak mukherjee
deepak mukherjeedeepak mukherjee
deepak mukherjee
 
jackhxs test 2
jackhxs test 2jackhxs test 2
jackhxs test 2
 
FR Conversions catalog 2016
FR Conversions catalog 2016FR Conversions catalog 2016
FR Conversions catalog 2016
 
Retails and resturant profile
Retails and resturant profileRetails and resturant profile
Retails and resturant profile
 
D.C.A
D.C.AD.C.A
D.C.A
 
Asbestos recognition
Asbestos recognitionAsbestos recognition
Asbestos recognition
 
Desahogo por culpa del sexo
Desahogo por culpa del sexoDesahogo por culpa del sexo
Desahogo por culpa del sexo
 
Acord Parlamentari Junts Pel Si-CUP-CC
Acord Parlamentari Junts Pel Si-CUP-CCAcord Parlamentari Junts Pel Si-CUP-CC
Acord Parlamentari Junts Pel Si-CUP-CC
 
A Dietary Solution to Arsenic Poisoning in Bangladesh
A Dietary Solution to Arsenic Poisoning in BangladeshA Dietary Solution to Arsenic Poisoning in Bangladesh
A Dietary Solution to Arsenic Poisoning in Bangladesh
 

Similar to Building a Distributed Data Portal

Similar to Building a Distributed Data Portal (20)

The Sanger Mouse Resources Portal - A Testbed for Collaborative Data Integration
The Sanger Mouse Resources Portal - A Testbed for Collaborative Data IntegrationThe Sanger Mouse Resources Portal - A Testbed for Collaborative Data Integration
The Sanger Mouse Resources Portal - A Testbed for Collaborative Data Integration
 
On demand access to Big Data through Semantic Technologies
 On demand access to Big Data through Semantic Technologies On demand access to Big Data through Semantic Technologies
On demand access to Big Data through Semantic Technologies
 
The life changing magic of tidying up your data: The art and science of makin...
The life changing magic of tidying up your data: The art and science of makin...The life changing magic of tidying up your data: The art and science of makin...
The life changing magic of tidying up your data: The art and science of makin...
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
APLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with DataAPLIC 2012: Discovering & Dealing with Data
APLIC 2012: Discovering & Dealing with Data
 
Linked Data
Linked DataLinked Data
Linked Data
 
A Framework for Dynamic Data Source Identification and Orchestration on the Web
A Framework for Dynamic Data Source Identification and Orchestration on the WebA Framework for Dynamic Data Source Identification and Orchestration on the Web
A Framework for Dynamic Data Source Identification and Orchestration on the Web
 
NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
 
Delivering a Campus Research Data Service with Globus
Delivering a Campus Research Data Service with GlobusDelivering a Campus Research Data Service with Globus
Delivering a Campus Research Data Service with Globus
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 
Linked data 20171106
Linked data 20171106Linked data 20171106
Linked data 20171106
 
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Tutorial
TutorialTutorial
Tutorial
 
Making data sharing count
Making data sharing countMaking data sharing count
Making data sharing count
 
Using Online Genealogy Programs
Using Online Genealogy ProgramsUsing Online Genealogy Programs
Using Online Genealogy Programs
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
Ubiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil TwinUbiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil Twin
 
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti... NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 

Recently uploaded

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 

Building a Distributed Data Portal