G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM



             Identification of G2P databases -
           challenges and proposal for a solution
                               Gudmundur A. Thorisson <gt50@leicester.ac.uk> ULEIC
                                   Adam J. Webb <ajw51@leicester.ac.uk> ULEIC
                                  Raymond Dalgleish <ray@leicester.ac.uk> ULEIC
                                    Juha Muilu <juha.muilu@helsinki.fi> FIMM



                                                                 -- Overview --
           ✴ Identification difficulties - the Knowledge Centre perspective
            ✴ Or, why we need persistent identifiers for database resources

           ✴ Proposal to collaborate with the BioDBCore initiative
            ✴ standardizing registration & description of bio-databases

                                                This work is published under the Creative Commons Attribution license
                                                (CC BY: http://creativecommons.org/licenses/by/3.0/) which means that
                                                it can be freely copied, redistributed and adapted, as long as proper
                                                attribution is given.

 GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012             1
Friday, 27 January 12
Linking	
  resources
         External	
  
         records	
  /	
  
         annotaEons




                              c.                        c.      c.         c.           c.
  Databases                 301C>                      465A   555G>      103C>        321G>
                              T                         >G      T          T            T




                                    DB	
  maintainer          SubmiIer   SubmiIer   DB	
  maintainer




Friday, 27 January 12
URLs	
  are	
  unstable
                        hIp://subdomain.example.com/path/to/resource

        • Domain	
  names	
  /	
  subdomains	
  can	
  change
              – hgvbaseg2p.org	
  -­‐>	
  gwascentral.org
              – server1.example.com	
  -­‐>	
  server2.example.com
        • Paths	
  can	
  change
              – e.g	
  /LOVD2/	
  change	
  to	
  /LOVD3/
        • LSDB	
  genes	
  can	
  move	
  
              – e.g	
  gene	
  ADAM19	
  moves	
  from	
  one	
  LOVD	
  install	
  to	
  another
        • Databases	
  can	
  merge
              – i.e	
  gene	
  ADAM19	
  on	
  two	
  different	
  installs	
  are	
  reconciled	
  into	
  a	
  
                single	
  install


Friday, 27 January 12
1:1
                 IDENTIFIER                                                                DATA	
  RESOURCE

        • Gene	
  name	
  not	
  suitable
              – >	
  1	
  database	
  for	
  a	
  given	
  gene
                        • gene.lovd.nl	
  -­‐>	
  returns	
  list	
  of	
  databases	
  (or	
  redirects	
  if	
  only	
  1	
  is	
  
                          known)
                            – 1	
  to	
  many
                        • lovd.nl/gene	
  -­‐>	
  redirects	
  to	
  *one*	
  database
                            – 1	
  to	
  one,	
  but	
  many	
  resource	
  do	
  not	
  receive	
  idenEfiers
                        • These	
  are	
  locators,	
  not	
  idenEfiers
        • Non-­‐gene	
  based	
  resources
        • Ideally	
  the	
  idenEfier	
  should	
  also	
  operate	
  as	
  the	
  
          locator	
  (like	
  DOIs	
  via	
  a	
  DOI	
  resoluEon	
  service)
              – hIp://dx.doi.org/10.19192	
  resolves	
  DOI	
  10.19192


Friday, 27 January 12
G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM




              Proposal to collaborate with BioDBCore
     • BioDBCore aims
            – annotation - organize the bio-database
              ‘resourceome’
            – discovery - e.g. which protein
              sequence databases are available?



     • Who’s behind it?
            – International Society for Biocuration
            – Resource catalogues: Bioinformatics
              Links, BioSiteMaps, NAR db-issue etc
            – Working group includes reps from NAR
              and DATABASE journals, MIBBI, Model
              organism db’s, CASIMIR mouse
              informatics consortium, others



 GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012     5
Friday, 27 January 12
G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM




 GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012     6
Friday, 27 January 12
G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM




             Persistent resource identifiers in BioDBCore
     • They plan to use MIRIAM registry / ID resolution service
           – unique, persistent and unambiguous identification of various kind of concepts.
                 • http://identifiers.org/ec-code/1.1.1.1
                 • http://identifiers.org/pubmed/16333295
                 • http://identifiers.org/doi/10.1038/nbt1156



     • Decouples identification from location
     • Many resourcesa are already registered with MIRIAM
     • Operated by EBI <-- long-term sustainability prospect
     • Adoption by players LS Semantic Web comunity
           – URIs for identifying entities in biological information represented in RDF
           – http://lsrn.org, Shared Names, Bio2RDF, others



 GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012     7
Friday, 27 January 12
G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM




                                      How might this work?
         • Using database URIs - plausible scenario
               – Persistent canonical URI: http://identifiers.org/biodbcore/10235900
               – Click URL, browser redirects to http://biodbcore.org/resource/10235900
               – BioDBCore metadata record for the database (akin to “landing page” online journal
                 site)



         • BioDBCore “landing page” presents database metadata
               – Information *about* the “thing”
               –        Name: Ehlers-Danlos Syndrome Variant Database
                        Main resource URL: https://eds.gene.le.ac.uk <-- the “thing” itself
                        [scope, data standards, other metadata]



         • Location of database = the “thing” itself

 GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012     8
Friday, 27 January 12
G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM




                                              Mututal benefits
         • To GEN2PHEN / G2P community
               – Identification - slot into resource identifier scheme for bio-databases globally, build
                 more detailed catalogues & annotation systems around this
               – Discovery - finding relevant LSDB and other G2P resources via range of search/
                 query tools outside the KC or LSDB lists
               – BioDBCore could possibly evolve into a sort of live “database publishing
                 platform” , instead of the static “snapshot” conventional papers.



         • To BioDBCore initiative
               – Acquire an entire category’s worth of metadata records & link to community
               – Extra pairs of eyes on what they’re doing, alternative perspective
               – Potential for further collaboration on contrib. tracking tools & ORCID integration




 GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012     9
Friday, 27 January 12
G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM




                Open questions, known unknowns etc.
         • BioDBCore quite new, many things remain in flux
               – e.g. the MIRIAM / identifiers.org technical details are vague



         • DOIs for BioDBCore records - register database DOIs for fuller
           integration into publishing process?


         • How will this work with existing LSDB lists?




 GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012     10
Friday, 27 January 12
G. A. Thorisson, ULEIC




                                      Acknowledgements
   GEN2PHEN Consortium
                                                                        This work has received funding from the
          http://www.gen2phen.org/about-gen2phen/partners               European Community's Seventh
                                                                        Framework Programme (FP7/2007-2013)
                                                                        under grant agreement number 200754 -
   Prof Anthony J. Brookes Bioinformatics Group, Leicester
                                                                        the GEN2PHEN project.




                              Contact me!

         <gt50@le.ac.uk> |<gthorisson@gmail.com>
                http://www.linkedin.com/in/mummi
                 http://www.twitter.com/gthorisson
                                                                      Published under the CC BY license (http://
                        http://www.gthorisson.name                    creativecommons.org/licenses/by/3.0/)




 GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012   11
Friday, 27 January 12

GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs

  • 1.
    G. A. Thorisson,A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM Identification of G2P databases - challenges and proposal for a solution Gudmundur A. Thorisson <gt50@leicester.ac.uk> ULEIC Adam J. Webb <ajw51@leicester.ac.uk> ULEIC Raymond Dalgleish <ray@leicester.ac.uk> ULEIC Juha Muilu <juha.muilu@helsinki.fi> FIMM -- Overview -- ✴ Identification difficulties - the Knowledge Centre perspective ✴ Or, why we need persistent identifiers for database resources ✴ Proposal to collaborate with the BioDBCore initiative ✴ standardizing registration & description of bio-databases This work is published under the Creative Commons Attribution license (CC BY: http://creativecommons.org/licenses/by/3.0/) which means that it can be freely copied, redistributed and adapted, as long as proper attribution is given. GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 1 Friday, 27 January 12
  • 2.
    Linking  resources External   records  /   annotaEons c. c. c. c. c. Databases 301C> 465A 555G> 103C> 321G> T >G T T T DB  maintainer SubmiIer SubmiIer DB  maintainer Friday, 27 January 12
  • 3.
    URLs  are  unstable hIp://subdomain.example.com/path/to/resource • Domain  names  /  subdomains  can  change – hgvbaseg2p.org  -­‐>  gwascentral.org – server1.example.com  -­‐>  server2.example.com • Paths  can  change – e.g  /LOVD2/  change  to  /LOVD3/ • LSDB  genes  can  move   – e.g  gene  ADAM19  moves  from  one  LOVD  install  to  another • Databases  can  merge – i.e  gene  ADAM19  on  two  different  installs  are  reconciled  into  a   single  install Friday, 27 January 12
  • 4.
    1:1 IDENTIFIER DATA  RESOURCE • Gene  name  not  suitable – >  1  database  for  a  given  gene • gene.lovd.nl  -­‐>  returns  list  of  databases  (or  redirects  if  only  1  is   known) – 1  to  many • lovd.nl/gene  -­‐>  redirects  to  *one*  database – 1  to  one,  but  many  resource  do  not  receive  idenEfiers • These  are  locators,  not  idenEfiers • Non-­‐gene  based  resources • Ideally  the  idenEfier  should  also  operate  as  the   locator  (like  DOIs  via  a  DOI  resoluEon  service) – hIp://dx.doi.org/10.19192  resolves  DOI  10.19192 Friday, 27 January 12
  • 5.
    G. A. Thorisson,A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM Proposal to collaborate with BioDBCore • BioDBCore aims – annotation - organize the bio-database ‘resourceome’ – discovery - e.g. which protein sequence databases are available? • Who’s behind it? – International Society for Biocuration – Resource catalogues: Bioinformatics Links, BioSiteMaps, NAR db-issue etc – Working group includes reps from NAR and DATABASE journals, MIBBI, Model organism db’s, CASIMIR mouse informatics consortium, others GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 5 Friday, 27 January 12
  • 6.
    G. A. Thorisson,A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 6 Friday, 27 January 12
  • 7.
    G. A. Thorisson,A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM Persistent resource identifiers in BioDBCore • They plan to use MIRIAM registry / ID resolution service – unique, persistent and unambiguous identification of various kind of concepts. • http://identifiers.org/ec-code/1.1.1.1 • http://identifiers.org/pubmed/16333295 • http://identifiers.org/doi/10.1038/nbt1156 • Decouples identification from location • Many resourcesa are already registered with MIRIAM • Operated by EBI <-- long-term sustainability prospect • Adoption by players LS Semantic Web comunity – URIs for identifying entities in biological information represented in RDF – http://lsrn.org, Shared Names, Bio2RDF, others GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 7 Friday, 27 January 12
  • 8.
    G. A. Thorisson,A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM How might this work? • Using database URIs - plausible scenario – Persistent canonical URI: http://identifiers.org/biodbcore/10235900 – Click URL, browser redirects to http://biodbcore.org/resource/10235900 – BioDBCore metadata record for the database (akin to “landing page” online journal site) • BioDBCore “landing page” presents database metadata – Information *about* the “thing” – Name: Ehlers-Danlos Syndrome Variant Database Main resource URL: https://eds.gene.le.ac.uk <-- the “thing” itself [scope, data standards, other metadata] • Location of database = the “thing” itself GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 8 Friday, 27 January 12
  • 9.
    G. A. Thorisson,A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM Mututal benefits • To GEN2PHEN / G2P community – Identification - slot into resource identifier scheme for bio-databases globally, build more detailed catalogues & annotation systems around this – Discovery - finding relevant LSDB and other G2P resources via range of search/ query tools outside the KC or LSDB lists – BioDBCore could possibly evolve into a sort of live “database publishing platform” , instead of the static “snapshot” conventional papers. • To BioDBCore initiative – Acquire an entire category’s worth of metadata records & link to community – Extra pairs of eyes on what they’re doing, alternative perspective – Potential for further collaboration on contrib. tracking tools & ORCID integration GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 9 Friday, 27 January 12
  • 10.
    G. A. Thorisson,A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM Open questions, known unknowns etc. • BioDBCore quite new, many things remain in flux – e.g. the MIRIAM / identifiers.org technical details are vague • DOIs for BioDBCore records - register database DOIs for fuller integration into publishing process? • How will this work with existing LSDB lists? GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 10 Friday, 27 January 12
  • 11.
    G. A. Thorisson,ULEIC Acknowledgements GEN2PHEN Consortium This work has received funding from the http://www.gen2phen.org/about-gen2phen/partners European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 - Prof Anthony J. Brookes Bioinformatics Group, Leicester the GEN2PHEN project. Contact me! <gt50@le.ac.uk> |<gthorisson@gmail.com> http://www.linkedin.com/in/mummi http://www.twitter.com/gthorisson Published under the CC BY license (http:// http://www.gthorisson.name creativecommons.org/licenses/by/3.0/) GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 11 Friday, 27 January 12