knowledge sharing
  in the sciences

          kaitlin thaney
program manager, science commons
   barcelona, spain - 1 july 2009


This presentation is licensed under the CreativeCommons-Attribution-3.0 license.
information sharing is at the root of
      scholarship and science

   the system of print publishing is a
     system of sharing knowledge

  then came the move to digital ...
the web revolutionized
search, commerce, collaboration
sharing became cheaper,
        easier technically

costs of copying, moving, storing ...
       down to nearly zero

  ability to link between nodes of
information (dating back to 1980s)
yet ...
   most of the useful knowledge
         is inaccessible.

  most of the useful knowledge is
   in the wrong technology.

we don’t have enough people working
         on the problem(s).
(0) the “research web”

(1) step 1: opening access

(2) step 2: access to research tools

(3) step 3: access to data

(4) step 4: open cyberinfrastructure

(5) what’s next?
make sharing easy, legal and scalable

        integrated approach

building part of the infrastructure for
          knowledge sharing
the “research web”

making the web work better for science

integrating disparate knowledge sources

make better use of existing information
          in the digital form
knowledge?

    journal articles
          data
       ontologies
      annotations
plasmids and cell lines
have capability to drastically increase
       sharing at lower cost ...

    ... though, still roadblocks ...

    silos of knowledge, walls of cost,
  secrecy, lagging incentive system for
        collaboration and sharing
step one

... it all starts with access to the
   scientific content and data ...
scientific revolutions occur when a
 sufficient body of data accumulates to
   overthrow the dominant theories
        we use to frame reality

     a so-called paradigm shift

                    - from thomas kuhn
scholarship entrenched in idea of
 transmitting knowledge via paper

mentality reflected even in the way we
          describe “papers”

 static, one-dimensional documents
in the digital world, “papers” can
  become living, breathing works

 no longer static PDF documents

linking to data sets, other relevant
papers, information, plasmids, genes
oldest scientific
    journal
  published in
    english-
speaking world

     1665
need to change the way we think of
        scholarly publishing,
       of knowledge sharing

        paradigm shift

   begin thinking of “papers” as
    containers of knowledge
“papers”


           IGFBP-5 plays a role in the
           regulation of cellular senescence
           via a p53-dependent pathway
           and in aging-associated vascular
           diseases
“networked knowledge”


            IGFBP-5 plays a role in the
            regulation of cellular senescence
            via a p53-dependent pathway
            and in aging-associated vascular
            diseases
content needs to be legally and
    technically accessible


                 we’ll start with legal ...
thinking of “papers” more as containers of
                knowledge



   copyright locks that container
traditional transfer of copyright agreement
Open Access (OA)
“ By open access to the literature, we mean its free
 availability on the public internet, permitting users to
read, download, copy, distribute, print, search, or link to
 the full texts of the articles, crawl them for indexing,
  pass them as data to software, or use them for any
    other lawful purpose, without financial, legal or
 technical barriers other than those inseparable from
           gaining access to the internet itself.”



          Image from the Public Library of Science, licensed to the public, under
                                       CC-BY-3.0
“The only constraint on reproduction and distribution,
 and the only role for copyright in this domain, should
 be to give authors control over the integrity of their
 work and the right to be properly acknowledged and
                         cited.”
http://creativecommons.org/licenses/
legal
implementation
step two

access to research tools from
       funded research
examples:
lab mice, cell lines,
 DNA, stem cells

 ... the physical
      materials

office supplies for
     science
ideally ...

 contact author, obtain material,
      recreate experiment

build on the existing work, publish

          and repeat ...
the reality ...
  materials difficult to find, fulfill, lack
               resources

reagents and assays often re-invented
       or reverse engineered

    locked in contracts, bureaucracy,
deliberate withholding, “club mentality”
no office superstores for
        science

no internet marketplaces
       for science
another way to think of it ...
solves the access problem via
           contract
UBMTA    (standardized material
        transfer agreements, or
 SLA
                MTAs)
SCMTA
         standard icons, CC
        methodology, metadata
step three

data and the public domain
legal issues:

“it’s complicated”
copyright and databases

        what’s protected? is it legal?

              facts are free

to what extent is there creative expression?
database protections based on jurisdiction

              sui generis,
          “sweat of the brow”
            Crown copyright

          the list goes on ....
social issues:

 protection instinct / culture of control

PD relinquishes much of this control, even
    control in the service of freedom

     “my data”, interpretation issues

      fear, uncertainty, doubt (FUD)
issue of license proliferation

   whatever you do to the least of the
databases, you do to the integrated system

       (the most restrictive wins)
need for a legally accurate and
              simple solution

reducing or eliminating the need to make the
       distinction of what’s protected

requires modular, standards based approach
                  to licensing
our solution ...

  reconstruction of the public domain

 create legal zones of certainty for data

attribution through accompanying norms
3.1 The protocol must promote legal predictability
and certainty.

3.2 The protocol must be easy to use and understand.

3.3 The protocol must impose the lowest possible
transaction costs on users.

For the full text:
http://sciencecommons.org/projects/publishing/open-access-data-protocol/
CC Zero waiver + SC norms



  waive rights   public domain

  attribution / citation through
community norms, not a contract
a protocol, not a license
calls for data providers to waive all rights
necessary for data extraction and re-use

  requires provider place no additional
    obligations (like share-alike) to limit
              downstream use

 request behavior (like attribution) through
        norms and terms of use
public domain = license, cannot be made
       “more free” - only less free

     PD = the original commons

      at least make metadata open,
   if one can’t make data itself open
early adopters,
committing to make their data open
            using CC0

   (1) Tranche - free, open source
   (2) Personal Genome Project
   (3) Digg, Flickr, WhiteHouse.gov
   (4) EMBL SIDER, TDI Kernel
technical considerations:

            persistent URLs
        open, stable namespaces
   standards, standards, standards
facilitate integration, interoperability

                          and more ...
step four

invest in open cyberinfrastructure
data without structure and annotation is a
            lost opportunity.

data should flow in an open, public, and
        extensible infrastructure

support recombination and reconfiguration
into computer models, queryable by search
                engine

        treated as public good
change requires a new legal infrastructure
      to encourage collaboration

         traits of legal protocols:

               legally accurate
            simple for scientists
           low transaction costs
         facilitate interoperability
        business and user friendly
what can you do?
             lead by example ...
design for maximum reuse

   ensure the freedom to integrate

 leverage existing open infrastructure

allows for snap together integration of
    the tools, data, research literature
what’s needed?

common standards, right software
  accessible data and content
       open infrastructure
  build for network effects
thank you
kaitlin@creativecommons.org
      sciencecommons.org
       neurocommons.org

Knowledge Sharing in the Sciences - 8JPL

  • 1.
    knowledge sharing in the sciences kaitlin thaney program manager, science commons barcelona, spain - 1 july 2009 This presentation is licensed under the CreativeCommons-Attribution-3.0 license.
  • 2.
    information sharing isat the root of scholarship and science the system of print publishing is a system of sharing knowledge then came the move to digital ...
  • 3.
    the web revolutionized search,commerce, collaboration
  • 4.
    sharing became cheaper, easier technically costs of copying, moving, storing ... down to nearly zero ability to link between nodes of information (dating back to 1980s)
  • 5.
    yet ... most of the useful knowledge is inaccessible. most of the useful knowledge is in the wrong technology. we don’t have enough people working on the problem(s).
  • 6.
    (0) the “researchweb” (1) step 1: opening access (2) step 2: access to research tools (3) step 3: access to data (4) step 4: open cyberinfrastructure (5) what’s next?
  • 7.
    make sharing easy,legal and scalable integrated approach building part of the infrastructure for knowledge sharing
  • 8.
    the “research web” makingthe web work better for science integrating disparate knowledge sources make better use of existing information in the digital form
  • 9.
    knowledge? journal articles data ontologies annotations plasmids and cell lines
  • 11.
    have capability todrastically increase sharing at lower cost ... ... though, still roadblocks ... silos of knowledge, walls of cost, secrecy, lagging incentive system for collaboration and sharing
  • 12.
    step one ... itall starts with access to the scientific content and data ...
  • 13.
    scientific revolutions occurwhen a sufficient body of data accumulates to overthrow the dominant theories we use to frame reality a so-called paradigm shift - from thomas kuhn
  • 14.
    scholarship entrenched inidea of transmitting knowledge via paper mentality reflected even in the way we describe “papers” static, one-dimensional documents
  • 15.
    in the digitalworld, “papers” can become living, breathing works no longer static PDF documents linking to data sets, other relevant papers, information, plasmids, genes
  • 16.
    oldest scientific journal published in english- speaking world 1665
  • 18.
    need to changethe way we think of scholarly publishing, of knowledge sharing paradigm shift begin thinking of “papers” as containers of knowledge
  • 19.
    “papers” IGFBP-5 plays a role in the regulation of cellular senescence via a p53-dependent pathway and in aging-associated vascular diseases
  • 20.
    “networked knowledge” IGFBP-5 plays a role in the regulation of cellular senescence via a p53-dependent pathway and in aging-associated vascular diseases
  • 21.
    content needs tobe legally and technically accessible we’ll start with legal ...
  • 22.
    thinking of “papers”more as containers of knowledge copyright locks that container
  • 23.
    traditional transfer ofcopyright agreement
  • 24.
  • 25.
    “ By openaccess to the literature, we mean its free availability on the public internet, permitting users to read, download, copy, distribute, print, search, or link to the full texts of the articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal or technical barriers other than those inseparable from gaining access to the internet itself.” Image from the Public Library of Science, licensed to the public, under CC-BY-3.0
  • 26.
    “The only constrainton reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.”
  • 27.
  • 28.
  • 29.
    step two access toresearch tools from funded research
  • 30.
    examples: lab mice, celllines, DNA, stem cells ... the physical materials office supplies for science
  • 31.
    ideally ... contactauthor, obtain material, recreate experiment build on the existing work, publish and repeat ...
  • 32.
    the reality ... materials difficult to find, fulfill, lack resources reagents and assays often re-invented or reverse engineered locked in contracts, bureaucracy, deliberate withholding, “club mentality”
  • 33.
    no office superstoresfor science no internet marketplaces for science
  • 34.
    another way tothink of it ...
  • 35.
    solves the accessproblem via contract UBMTA (standardized material transfer agreements, or SLA MTAs) SCMTA standard icons, CC methodology, metadata
  • 36.
    step three data andthe public domain
  • 37.
  • 38.
    copyright and databases what’s protected? is it legal? facts are free to what extent is there creative expression?
  • 39.
    database protections basedon jurisdiction sui generis, “sweat of the brow” Crown copyright the list goes on ....
  • 40.
    social issues: protectioninstinct / culture of control PD relinquishes much of this control, even control in the service of freedom “my data”, interpretation issues fear, uncertainty, doubt (FUD)
  • 41.
    issue of licenseproliferation whatever you do to the least of the databases, you do to the integrated system (the most restrictive wins)
  • 42.
    need for alegally accurate and simple solution reducing or eliminating the need to make the distinction of what’s protected requires modular, standards based approach to licensing
  • 43.
    our solution ... reconstruction of the public domain create legal zones of certainty for data attribution through accompanying norms
  • 44.
    3.1 The protocolmust promote legal predictability and certainty. 3.2 The protocol must be easy to use and understand. 3.3 The protocol must impose the lowest possible transaction costs on users. For the full text: http://sciencecommons.org/projects/publishing/open-access-data-protocol/
  • 45.
    CC Zero waiver+ SC norms waive rights public domain attribution / citation through community norms, not a contract
  • 46.
    a protocol, nota license
  • 47.
    calls for dataproviders to waive all rights necessary for data extraction and re-use requires provider place no additional obligations (like share-alike) to limit downstream use request behavior (like attribution) through norms and terms of use
  • 48.
    public domain =license, cannot be made “more free” - only less free PD = the original commons at least make metadata open, if one can’t make data itself open
  • 49.
    early adopters, committing tomake their data open using CC0 (1) Tranche - free, open source (2) Personal Genome Project (3) Digg, Flickr, WhiteHouse.gov (4) EMBL SIDER, TDI Kernel
  • 50.
    technical considerations: persistent URLs open, stable namespaces standards, standards, standards facilitate integration, interoperability and more ...
  • 51.
    step four invest inopen cyberinfrastructure
  • 52.
    data without structureand annotation is a lost opportunity. data should flow in an open, public, and extensible infrastructure support recombination and reconfiguration into computer models, queryable by search engine treated as public good
  • 53.
    change requires anew legal infrastructure to encourage collaboration traits of legal protocols: legally accurate simple for scientists low transaction costs facilitate interoperability business and user friendly
  • 54.
    what can youdo? lead by example ...
  • 55.
    design for maximumreuse ensure the freedom to integrate leverage existing open infrastructure allows for snap together integration of the tools, data, research literature
  • 56.
    what’s needed? common standards,right software accessible data and content open infrastructure build for network effects
  • 57.
    thank you kaitlin@creativecommons.org sciencecommons.org neurocommons.org