CNI fall 2008

    e-research, data integration

            john wilbanks
creative commons / science commons
1. e-research requires
new approaches to data
       integration.
databases as unique entities,
instead of nodes in a network
http://nar.oxfordjournals.org/cgi/content/full/gkm1037/DC1/1
“packages”
not monolithic
not centralized
scalable
aggregation
not-software
  scalable
  modular
what about science?
science is not unlike wikipedia...




...except authenticated, and expensive.
inefficient and expensive ecosystem of
  processes to peer-produce and
     review scholarly content
from a technical perspective
2. multivariate connected
barriers to new methods
    of data integration
cognitive barriers.
the knowledge was
   human-scale
web 2.0, science 3.0, what about making
          Google work better?
over 200
   years at
one paper/day
what you want is
    a list of genes.

not a list of documents.
technical barriers.
IGFBP-5 plays a role in the
regulation of cellular senescence
via a p53-dependent pathway
and in aging-associated
vascular...
IGFBP-5 plays a role in the
regulation of cellular senescence
via a p53-dependent pathway
and in aging-associated
vascular...
tradition barriers.
legal barriers.
indexing: disallowed.




 http://orpheus-1.ucsd.edu/acq/license/cdlelsevier2004.pdf
legal integration: impossible.




      http://nar.oxfordjournals.org/cgi/content/full/gkm1037/DC1/1
3. moving from a document web
         to a data web.
a network
of devices
01-23-45-67-89-ab
papers contain
     ideas,
        a network
  like boxes
            of
contain books
     documents
papers contain
     ideas,
  like boxes
        http://foo.bar
contain books
causes
drink coffee            feel awake




a network
 of ideas
causes
      drink coffee            feel awake




http://foo.bar/ideas/causes
“graph” networks
making computers understand links between documents



                     links to
       Web page                    We...
making computers understand relationships between concepts




                          causes
        drinking coffee   ...
http://ontology.foo.org/causes



                                          causes
          drinking coffee              ...
use the web to
           integrate information
            from different places
             and different names
“coffee...
causes
drink coffee            feel awake
bed
                                                                                    person

 located at               ...
4. basic requirements for
  modular, package-based
approaches to “knowledge”
it starts with the public
         domain.
it takes ontologies.
“Kant saw the mind could not
    function as an empty container
    that simply receives data from
    the outside. Someth...
requires a modular,
  standards-based
approach to licensing.
+        +         +




+   is it legal?   +




+        +         +
license propagation: whatsoever you do to the least of
the databases, you do to the integrated knowledgebase

          (t...
a protocol, not a license
it takes some
namespace work.
database record that is about a thing



documentation that tells what a URI names
URI requirements
1. The referent of the URI must be made clear
       through documentation.
2.	 Provision of such documen...
URI requirements
1. The referent of the URI must be made clear
     through documentation.
2. Provision of such documentat...
URI requirements
1. The referent of the URI must be made clear
     through documentation.
2. Provision of such documentat...
Stability of reference (meaning, denotation)
         Stability of documentation
           Stability of the referent
URI requirements
1. The referent of the URI must be made clear
     through documentation.
2. Provision of such documentat...
and what about ontologies?
copyrightable?

“it’s complicated.”
•and junk) (quality control: spam
 extension


•integrity and attribution) loss of
 remix (brand confusion,


•common(fail...
5. proof of concept - open
source data integration for
       neuroscience.
a repository of ontologies,
namespaces, and integrated
         databases.


   http://neurocommons.org
e pluribus unum.
we can transform complex queries into links


            prefix go: <http://purl.org/obo/owl/GO#>
    prefix rdfs: <http:...
DRD1, 1812      adenylate cyclase activation
ADRB2, 154      adenylate cyclase activation
ADRB2, 154      arrestin mediate...
we can transform complex queries into links
http://hcls1.csail.mit.edu:8890/sparql/?query=prefix%20go%3A%20%3Chttp%3A%2F%2F...
we can transform complex queries into links
we can help scholars “remix” queries
  prefix go: <http://purl.org/obo/owl/GO#>
  prefix rdfs: <http://www.w3.org/2000/01/rd...
we can build a corpus of queries as links
we can re-use cultural tools for scholarship
http://sparql.neurocommons.org
“a running Neurocommons mirror consumes
     a fair amount of system resources”
http://kingsley.idehen.name:8890

http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtEC2AMINeuroCommonsInstall
conclusion?
a. it’s very hard work to use
 the semantic web right now

b. it’s worth it if you have the
cognitive overload problem.

c...
thank you

wilbanks@creativecommons.org

  http://sciencecommons.org
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Cni Fall 08 Briefing Science Commons
Upcoming SlideShare
Loading in...5
×

Cni Fall 08 Briefing Science Commons

1,974

Published on

Coalition for Networked Information annual fall meeting. Project briefing on the Neurocommons project.

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,974
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Cni Fall 08 Briefing Science Commons

  1. 1. CNI fall 2008 e-research, data integration john wilbanks creative commons / science commons
  2. 2. 1. e-research requires new approaches to data integration.
  3. 3. databases as unique entities, instead of nodes in a network
  4. 4. http://nar.oxfordjournals.org/cgi/content/full/gkm1037/DC1/1
  5. 5. “packages”
  6. 6. not monolithic not centralized
  7. 7. scalable
  8. 8. aggregation
  9. 9. not-software scalable modular
  10. 10. what about science?
  11. 11. science is not unlike wikipedia... ...except authenticated, and expensive.
  12. 12. inefficient and expensive ecosystem of processes to peer-produce and review scholarly content
  13. 13. from a technical perspective
  14. 14. 2. multivariate connected barriers to new methods of data integration
  15. 15. cognitive barriers.
  16. 16. the knowledge was human-scale
  17. 17. web 2.0, science 3.0, what about making Google work better?
  18. 18. over 200 years at one paper/day
  19. 19. what you want is a list of genes. not a list of documents.
  20. 20. technical barriers.
  21. 21. IGFBP-5 plays a role in the regulation of cellular senescence via a p53-dependent pathway and in aging-associated vascular diseases
  22. 22. IGFBP-5 plays a role in the regulation of cellular senescence via a p53-dependent pathway and in aging-associated vascular diseases
  23. 23. tradition barriers.
  24. 24. legal barriers.
  25. 25. indexing: disallowed. http://orpheus-1.ucsd.edu/acq/license/cdlelsevier2004.pdf
  26. 26. legal integration: impossible. http://nar.oxfordjournals.org/cgi/content/full/gkm1037/DC1/1
  27. 27. 3. moving from a document web to a data web.
  28. 28. a network of devices
  29. 29. 01-23-45-67-89-ab
  30. 30. papers contain ideas, a network like boxes of contain books documents
  31. 31. papers contain ideas, like boxes http://foo.bar contain books
  32. 32. causes drink coffee feel awake a network of ideas
  33. 33. causes drink coffee feel awake http://foo.bar/ideas/causes
  34. 34. “graph” networks
  35. 35. making computers understand links between documents links to Web page Web page
  36. 36. making computers understand relationships between concepts causes drinking coffee feel awake
  37. 37. http://ontology.foo.org/causes causes drinking coffee feel awake http://ontology.foo.org/drinking coffee http://ontology.foo.org/feel awake h
  38. 38. use the web to integrate information from different places and different names “coffee” “cafe” coffee http://ontology.foo.org/coffee “kopi”
  39. 39. causes drink coffee feel awake
  40. 40. bed person located at get out of bed last subevent does not want wants get out of bed after causes drink coffee feel awake first subevent subevent causes feel jittery open eyes after after make coffee pour coffee pick up cup drink is a is for located in coffee cafe property of often near often near wet cup sugar
  41. 41. 4. basic requirements for modular, package-based approaches to “knowledge”
  42. 42. it starts with the public domain.
  43. 43. it takes ontologies.
  44. 44. “Kant saw the mind could not function as an empty container that simply receives data from the outside. Something had to be giving order to the incoming data...” - http://en.wikipedia.org/wiki/Immanuel_Kant
  45. 45. requires a modular, standards-based approach to licensing.
  46. 46. + + + + is it legal? + + + +
  47. 47. license propagation: whatsoever you do to the least of the databases, you do to the integrated knowledgebase (the most restrictive license wins)
  48. 48. a protocol, not a license
  49. 49. it takes some namespace work.
  50. 50. database record that is about a thing documentation that tells what a URI names
  51. 51. URI requirements 1. The referent of the URI must be made clear through documentation. 2. Provision of such documentation via a widely deployed network protocol must be an ongoing concern. 3. The documentation provider must be responsive to community needs, such as the need to have mistakes fixed and the need for stability of reference. 4. Documentation must be open.
  52. 52. URI requirements 1. The referent of the URI must be made clear through documentation. 2. Provision of such documentation via a widely deployed network protocol must be an ongoing concern. 3. The documentation provider must be responsive to community needs, such as the need to have mistakes fixed and the need for stability of reference. 4. Documentation must be open.
  53. 53. URI requirements 1. The referent of the URI must be made clear through documentation. 2. Provision of such documentation via a widely deployed network protocol must be an ongoing concern. 3. The documentation provider must be responsive to community needs, such as the need to have mistakes fixed and the need for stability of reference. 4. Documentation must be open.
  54. 54. Stability of reference (meaning, denotation) Stability of documentation Stability of the referent
  55. 55. URI requirements 1. The referent of the URI must be made clear through documentation. 2. Provision of such documentation via a widely deployed network protocol must be an ongoing concern. 3. The documentation provider must be responsive to community needs, such as the need to have mistakes fixed and the need for stability of reference. 4. Documentation must be open.
  56. 56. and what about ontologies?
  57. 57. copyrightable? “it’s complicated.”
  58. 58. •and junk) (quality control: spam extension •integrity and attribution) loss of remix (brand confusion, •common(failure to adhere to formats protocols or technology) •of all Web things...) persistence (the transient nature
  59. 59. 5. proof of concept - open source data integration for neuroscience.
  60. 60. a repository of ontologies, namespaces, and integrated databases. http://neurocommons.org
  61. 61. e pluribus unum.
  62. 62. we can transform complex queries into links prefix go: <http://purl.org/obo/owl/GO#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> Mesh: Pyramidal Neurons prefix owl: <http://www.w3.org/2002/07/owl#> prefix mesh: <http://purl.org/commons/record/mesh/> prefix sc: <http://purl.org/science/owl/sciencecommons/> prefix ro: <http://www.obofoundry.org/ro/ro.owl#> select ?genename ?processname where { graph <http://purl.org/commons/hcls/pubmesh> Pubmed: Journal Articles { ?paper ?p mesh:D017966 . ?article sc:identified_by_pmid ?paper. ?gene sc:describes_gene_or_gene_product_mentioned_by ?article. } graph <http://purl.org/commons/hcls/goa> { ?protein rdfs:subClassOf ?res. ?res owl:onProperty ro:has_function. ?res owl:someValuesFrom ?res2. Entrez Gene: Genes ?res2 owl:onProperty ro:realized_as. ?res2 owl:someValuesFrom ?process. graph <http://purl.org/commons/hcls/20070416/classrelations> {{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166} union {?process rdfs:subClassOf go:GO_0007166 }} ?protein rdfs:subClassOf ?parent. ?parent owl:equivalentClass ?res3. GO: Signal Transduction ?res3 owl:hasValue ?gene. } graph <http://purl.org/commons/hcls/gene> { ?gene rdfs:label ?genename } graph <http://purl.org/commons/hcls/20070416> { ?process rdfs:label ?processname} }
  63. 63. DRD1, 1812 adenylate cyclase activation ADRB2, 154 adenylate cyclase activation ADRB2, 154 arrestin mediated desensitization of G-protein coupled receptor protein signaling pathway DRD1IP, 50632 dopamine receptor signaling pathway DRD1, 1812 dopamine receptor, adenylate cyclase activating pathway DRD2, 1813 dopamine receptor, adenylate cyclase inhibiting pathway GRM7, 2917 G-protein coupled receptor protein signaling pathway GNG3, 2785 G-protein coupled receptor protein signaling pathway GNG12, 55970 G-protein coupled receptor protein signaling pathway DRD2, 1813 G-protein coupled receptor protein signaling pathway ADRB2, 154 G-protein coupled receptor protein signaling pathway CALM3, 808 G-protein coupled receptor protein signaling pathway HTR2A, 3356 G-protein coupled receptor protein signaling pathway DRD1, 1812 G-protein signaling, coupled to cyclic nucleotide second messenger SSTR5, 6755 G-protein signaling, coupled to cyclic nucleotide second messenger MTNR1A, 4543 G-protein signaling, coupled to cyclic nucleotide second messenger CNR2, 1269 G-protein signaling, coupled to cyclic nucleotide second messenger HTR6, 3362 G-protein signaling, coupled to cyclic nucleotide second messenger GRIK2, 2898 glutamate signaling pathway GRIN1, 2902 glutamate signaling pathway GRIN2A, 2903 glutamate signaling pathway GRIN2B, 2904 glutamate signaling pathway ADAM10, 102 integrin-mediated signaling pathway GRM7, 2917 negative regulation of adenylate cyclase activity LRP1, 4035 negative regulation of Wnt receptor signaling pathway ADAM10, 102 Notch receptor processing ASCL1, 429 Notch signaling pathway HTR2A, 3356 serotonin receptor signaling pathway ADRB2, 154 transmembrane receptor protein tyrosine kinase activation (dimerization) PTPRG, 5793 transmembrane receptor protein tyrosine kinase signaling pathway EPHA4, 2043 transmembrane receptor protein tyrosine kinase signaling pathway NRTN, 4902 transmembrane receptor protein tyrosine kinase signaling pathway CTNND1, 1500 Wnt receptor signaling pathway `
  64. 64. we can transform complex queries into links http://hcls1.csail.mit.edu:8890/sparql/?query=prefix%20go%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fobo%2Fowl%2FGO%23%3E%0Aprefix%20rdfs%3A %20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0Aprefix%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2002% 2F07%2Fowl%23%3E%0Aprefix%20mesh%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Frecord%2Fmesh%2F%3E%0Aprefix%20sc%3A%20% 3Chttp%3A%2F%2Fpurl.org%2Fscience%2Fowl%2Fsciencecommons%2F%3E%0Aprefix%20ro%3A%20%3Chttp%3A%2F%2Fwww.obofoundry.org%2Fro %2Fro.owl%23%3E%0A%0Aselect%20%3Fgenename%20%3Fprocessname%0Awhere%0A%7B%20%20graph%20%3Chttp%3A%2F%2Fpurl.org% 2Fcommons%2Fhcls%2Fpubmesh%3E%0A%20%20%20%20%20%7B%20%3Fpaper%20%3Fp%20mesh%3AD017966%20.%0A%20%20%20%20%20%20% 20%3Farticle%20sc%3Aidentified_by_pmid%20%3Fpaper.%0A%20%20%20%20%20%20%20%3Fgene%20sc% 3Adescribes_gene_or_gene_product_mentioned_by%20%3Farticle.%0A%20%20%20%20%20%7D%0A%20%20%20graph%20%3Chttp%3A%2F% 2Fpurl.org%2Fcommons%2Fhcls%2Fgoa%3E%0A%20%20%20%20%20%7B%20%3Fprotein%20rdfs%3AsubClassOf%20%3Fres.%0A%20%20%20%20% 20%20%20%3Fres%20owl%3AonProperty%20ro%3Ahas_function.%0A%20%20%20%20%20%20%20%3Fres%20owl%3AsomeValuesFrom%20%3Fres2.% 0A%20%20%20%20%20%20%20%3Fres2%20owl%3AonProperty%20ro%3Arealized_as.%0A%20%20%20%20%20%20%20%3Fres2%20owl% 3AsomeValuesFrom%20%3Fprocess.%0A%20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2F20070416%2Fclassrelations%3E %0A%20%20%20%20%20%7B%7B%3Fprocess%20%3Chttp%3A%2F%2Fpurl.org%2Fobo%2Fowl%2Fobo%23part_of%3E%20go%3AGO_0007166%7D% 0A%20%20%20%20%20%20%20union%0A%20%20%20%20%20%20%7B%3Fprocess%20rdfs%3AsubClassOf%20go%3AGO_0007166%20%7D%7D%0A %20%20%20%20%20%20%20%3Fprotein%20rdfs%3AsubClassOf%20%3Fparent.%0A%20%20%20%20%20%20%20%3Fparent%20owl% 3AequivalentClass%20%3Fres3.%0A%20%20%20%20%20%20%20%3Fres3%20owl%3AhasValue%20%3Fgene.%0A%20%20%20%20%20%20%7D%0A% 20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2Fgene%3E%0A%20%20%20%20%20%7B%20%3Fgene%20rdfs%3Alabel%20% 3Fgenename%20%7D%0A%20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2F20070416%3E%0A%20%20%20%20%20%7B% 20%3Fprocess%20rdfs%3Alabel%20%3Fprocessname%7D%0A%7D&format=&maxrows=50
  65. 65. we can transform complex queries into links
  66. 66. we can help scholars “remix” queries prefix go: <http://purl.org/obo/owl/GO#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix owl: <http://www.w3.org/2002/07/owl#> prefix mesh: <http://purl.org/commons/record/mesh/> prefix sc: <http://purl.org/science/owl/sciencecommons/> prefix ro: <http://www.obofoundry.org/ro/ro.owl#> select ?genename ?processname where { graph <http://purl.org/commons/hcls/pubmesh> mesh:D009369 { ?paper ?p ?article sc:identified_by_pmid ?paper. . Mesh: Cancer ?gene sc:describes_gene_or_gene_product_mentioned_by ?article. } graph <http://purl.org/commons/hcls/goa> { ?protein rdfs:subClassOf ?res. ?res owl:onProperty ro:has_function. ?res owl:someValuesFrom ?res2. ?res2 owl:onProperty ro:realized_as. ?res2 owl:someValuesFrom ?process. graph <http://purl.org/commons/hcls/20070416/classrelations> {{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0006610} union go:GO_0006610 }} {?process rdfs:subClassOf ?protein rdfs:subClassOf ?parent. GO: Ribosomal Protein ?parent owl:equivalentClass ?res3. ?res3 owl:hasValue ?gene. } graph <http://purl.org/commons/hcls/gene> { ?gene rdfs:label ?genename } graph <http://purl.org/commons/hcls/20070416> { ?process rdfs:label ?processname} }
  67. 67. we can build a corpus of queries as links
  68. 68. we can re-use cultural tools for scholarship
  69. 69. http://sparql.neurocommons.org
  70. 70. “a running Neurocommons mirror consumes a fair amount of system resources”
  71. 71. http://kingsley.idehen.name:8890 http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtEC2AMINeuroCommonsInstall
  72. 72. conclusion?
  73. 73. a. it’s very hard work to use the semantic web right now b. it’s worth it if you have the cognitive overload problem. c. none of it works without an open knowledge approach
  74. 74. thank you wilbanks@creativecommons.org http://sciencecommons.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×