Data Sharing & Data Citation Micah Altman, Institute for Quantitative Social Science, Harvard University Prepared for  data coding, analysis, archiving, and sharing for open collaboration NSF Sept 15-16, 2011
Collaborators* Margaret Adams, George Alter, Leonid Andreev, Ed Bachman,  Adam Buchbinder,  Ken Bollen, Bryan Beecher, Steve Burling, Tom Carsey, Kevin Condon, Jonathan Crabtree, Merce Crosas, Darrell Donakowski, Myron Guttman, Gary King, Patrick King, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Amy Pienta, Lois Timms-Ferrarra,  Akio Sone, Bob Treacy, Copeland Young Research Support Thanks to the Library of Congress (PA#NDP03-1), the National Science Foundation (DMS-0835500, SES 0112072), IMLS (LG-05-09-0041-09),  the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive.  Data Sharing & Data Citation * And co-conspirators
Related Work Altman, M., and J. Crabtree, 2011.  “Using the SafeArchive System: TRAC-Based Auditing of LOCKSS”,  Proceedings of Archiving 2011.  M. Crosas,  2011,  “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data”,  D-Lib Magazine  17(1/2).  M. Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences."  The American Archivist . 72(1): 169-182 Gutmann,M.  Abrahamson, M, Adams, M.O., Altman, M, Arms, C., Bollen, K., Carlson, M., Crabtree, J., Donakowski, D., King, G., Lyle, J., Maynard, M., Pienta, A., Rockwell, R, Timms-Ferrara L., Young, C., 2009. "From Preserving the Past to Preserving the Future: The Data-PASS Project and the challenges of preserving digital social science data",  Library Trends  57(3):315-33 M. Altman, 2008,  "A Fingerprint Method for Verification of Scientific Data" in,  Advances in Systems, Computing Sciences and Software Engineering , (Proceedings of the International Conference on Systems, Computing Sciences and Software Engineering 2007) , Springer Verlag. M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib, 13, 3/4 (March/April). G. King, 2007, " An Introduction to the Dataverse Network as an Infrastructure for Data Sharing",  Sociological Methods and Research , Vol. 32, No. 2, pp. 173-199 Data Sharing & Data Citation
Motivations Data Sharing & Data Citation
Access to Data is the Foundation of Science Science is not (only) about being scientific Scientific progress requires community:  competition and collaboration in the pursuit of common goals Without access to the same materials:  no community exists … data is the nucleus of scientific collaboration The value of an article that can’t be replicated: ? Scholarly articles are summaries, not the actual research results Experimental expensive to reproduce, observational data impossible Hard for journal editors to verify --  If you find it, how do you know it’s the same? Replication projects show: many published articles cannot be replicated … data is needed for scientific replication Data Sharing & Data Citation Sources: Fienberg et. al 1985; ICSU 2004; Nature 2009
Open Data Broadens & Deepens Impact Data Intensive Science Increased opportunities for interdisciplinarity Science modeling across multiple scales Continuous, complete, fine-grained information on physical processes, systems, human behavior Education Data eases transition from education to research Open Data Democratizes Science Citizen-scientist Developing countries Researchers outside of inner-circle of institution Crowd-sourcing, open notebooks, and mashups Data Sharing & Data Citation & Data Sharing Increases Publication Impact [Gleditsch 2003;  Wilson 2008; Piowar 2007]
Data is Key to Government Statistics = state-istics Reformers use data To assess the performance of the state To assess social conditions  Governments attempt to  control access to data to evade accountability Policy debates often centers on data War on poverty, civil rights, consumer protection – all made heavy use of statistical arguments Economic, environment policies are data-intensive Data access brings together both sides of political spectrum In modern democracy the public needs a direct source of information Liberals and conservatives support access to data informing policy Data Sharing & Data Citation Source: “Propaganda”  http://www.media-studies.ca/articles/images/berlin_wall.jpg Sources: Gough 2003; Shulman 2006; Wagner & Steinzor 2006;  Alonzo and Starr 1988
Open Data is “Research Insurance” Keeps open option to after nominal end of project – extends lifecycle Continuation projects Publication revisions Broader research programs Insures against loss of “project memory” Departure of a senior personnel from institution Departure of post-docs, graduate students from students Accidental loss of data due to local IT failures Reduces questions from secondary analysts Insures against intentional and unintentional errors All collaborators can verify results prior to publication Enables more intensive peer review Data Sharing & Data Citation Source: Berman, et. al 2008.
Data Sharing Across Communities Data sharing practices vary greatly across communities Proprietary Formal sharing Formal deposit Significant correlates: Tacit knowledge, Individual investment of time in data collection, confidentiality, journal practices, funder policies & practices [Micah Altman, 10/6/2009] Open Data Source: R.I.N. 2008 also see Borgman 2007;  Niu 2006
So when do things go wrong? Source: Reich & Rosenthal 2005
Confidentiality Restrictions for Personal Private Information Overlapping laws differ: People/subjects covered Organizations covered Required technical and procedural controls Definition of identifiability Some Strategies Consent for sharing up front Commercialize Observe public activity Share aggregates only De-identify Recent Statistical Results (Oversimplified    ) De-identification often leaks Aggregation sometimes leaks Not included : EU directives, foreign laws,  ANPRM Request for Comment on proposed revisions to 45 CFR 46 www.hhs.gov/ohrp/humansubjects/anprm2011page.html
Integrating Tools Data Sharing & Data Citation
Data Management - Goals Data Sharing & Data Citation
Data Management Elements Data Sharing & Data Citation
Core Requirements for Data Sharing Infrastructure Stakeholder incentives  recognition; citation; payment; compliance; services Dissemination access to metadata; documentation; data  Access control authentication; authorization; rights management Provenance chain of control; verification  of metadata,  bits, semantic content Persistence bits; semantic content; use Legal protection rights management;  consent; record keeping; auditing Usability discovery; deposit; curation; administration; collaboration Business model Data Sharing & Data Citation Sources: King 2007; ICSU 2004; NSB 2005
Why is Infrastructure for Data Sharing Necessary? Accessibility: Many large data sets: in public archives Most data in published articles:  not accessible, results not replicable without the original author Most data sets from federal grants: not publicly available Problems with discovery and linking even with professional archives: Data in different archives have different identifiers Archives change identifiers, links Changes to data are made; identifiers are reused or removed; old data are lost Locating/browsing/extracting requires specialized tools & approaches Sharing data requires exposing tacit knowledge Explicit documentation of data structure, collection process, interpretation Harmonizing/linking to known ontologies, metadata schemas, vocabularies Data sets are not preserved like books Static data files (even if on the web): unreadable after a few years When storage methods change: some data sets are lost; others have altered content! Why not Single Centralized infrastructure ? Single point of failure Difficult when data are heterogeneous in format, origin, size, effort needed to collect or analyze, legal access rules, etc. Data producers want credit, control, and visibility Data Sharing & Data Citation
Dataverse For Organizations For Scholars Brand it like your own website. Upload any type of data. Establish a persistent data citation Facilitate data discovery Provide live analysis  Receive permanent storage space Used by archives, libraries, journals, schools Enable contributors to upload data Organize studies by collections Search across a universe of data Control access and terms of use Federate with catalogs and partners: 
OAI-PMH, LOCKSS, Z39.50, DDI Gateway to over 39000 social science studies (world’s largest catalog) Web Virtual Hosting 2.0 Service -- Over 350 virtual archives Federated search and delivery
Virtual Archive: Scholar Site Scholar retains control over branding and dissemination Preservation and long-term access is guaranteed Dissemination and compliance with Data Manage Plans is verifiable Integrates with OpenScholar Data Sharing & Data Citation
Interoperability & Integration
Mind the Gaps GAP: Coverage across entire lifecycle   -- decoupling of dissemination, formal publication, long-term access, reuse GAP: Interoperability and integration across tools  GAP: Maturity and sustainability of tools --- most tools have small communities of maintainers, particular worrisome w/lack of interoperability Data Sharing & Data Citation design publishing dissemination archiving reuse collection processing integration analysis cati / capi Enhanced publication (sweave) identifiers  Google-__________ data archives, hosting, networks General digital libraries and repositories Scientific workflow systems
Supporting Institutions Data Sharing & Data Citation
Institutional Data Access Strategies* “ Ignore it, maybe someone else will take care of it”  (internet archive, …) “ We’ll always be here” (self-preservation) Let the publishers do It “ We are ever true to [Insert Alma Mater]” (institutional archives) “ Ask us (domain archive) to do it” (ICPSR, MRA, Roper, …) “ Ask someone(s) else do it” (Data-PASS, Meta-Archive, ClockSS) “ Trust No One” (LOCKSS) Data Sharing & Data Citation *All quotes are entirely fictional :-)
Institutional Preservation Strategies -- Corollaries There are potential single points of failure in both technology, organization and legal regimes: Diversify your portfolio:  multiple software systems, hardware, organization (e.g., Data-PASS :-) Seek international partners Many combinations of preservation & dissemination strategies are compatible: Layer technologies and strategies Leverage dissemination (in a planned way) for preservation  (and vice-versa) Preservation is impossible to demonstrate conclusively: Consider organizational credentials No organization is absolutely certain to be reliable Data Sharing & Data Citation
Partnership Agreements MOU Secession Plans & Agreements Coordinating Operations  Development of shared procedures Joint  “ Not-bad ”  practices Identification & selection Metadata Confidentiality Shared Catalog Unified Discovery Content replication Data-PASS is a broad-based partnership of data archives dedicated to acquiring and preserving data at-risk of being lost to the social science research community. Data-PASS partners have rescued thousands of data sets and created the largest catalog of social science data in existence. Data-PASS partners collaborate to identify and promote good archival practices, seek out at-risk research data, build preservation infrastructure, and mutually safeguard Data Sharing & Data Citation
Ideal integration of policy and technology?  Expressed in high-level domain/business language Captures a significant portion of business domain Translated to a formal schematization Automatically measurable Directly controls procedures & actions to achieve compliance Verifiable translation from business domain policy  Data Sharing & Data Citation Policy: A set of rules and objectives expressed at a high level domain that controls actions at a lower level
Data Sharing & Data Citation “ The repository system must be able to identify the number of copies of all stored digital objects, and the location of each object and their copies.” Policy Schematization Behavior (Operationalization)
SafeArchive:  TRAC-Based Management of LOCKSS  Facilitating collaborative replication and preservation with technology…  Collaborators  declare explicit non-uniform resource commitments Policy  records commitments, storage network properties Storage layer  provides replication, integrity, freshness, versioning  SafeArchive software  provides monitoring, auditing, and provisioning  Content  is harvested through HTTP (LOCKSS) or OAI-PMH Integration of  LOCKSS, The Dataverse Network, TRAC Data Sharing & Data Citation
Aligning Incentives Data Sharing & Data Citation
Stakeholders & Information Flow Data Sharing & Data Citation Data Collection Publication of  Research Products
Data Citation as a Leverage Point  Services Identifiers to specific fixed versions of data are needed to establish unambiguous chains of  provenance Identifiers that can be globally resolved to machine-understandable metadata and to identified object are needed to building generalized  access  and analysis services  Persistence of identifiers are needed to maintain  long-term access  Incentives Scholarly credit (intellectual attribution) is a large motivator for many researchers  – citation creates incentive for researchers to publish data Scholars also comply with  enforceable  journal policies -- requiring data citation is a light-weight method to make data access policies auditable Impact/usage is a motivator for public research funders – data citation provides foundation for measures of usage and impact Data Sharing & Data Citation
Data Sharing & Data Citation Common Principles
Data Sharing & Data Citation
Thanks to 37 Participants Data Sharing & Data Citation
What is a citation? Data Sharing & Data Citation
Data Sharing & Data Citation
Workflow Data Sharing & Data Citation
Workflow Data Sharing & Data Citation
-  Separate scientific principles, use cases, requirements Distinguish syntax, semantics, from presentation Design for ecosystem & lifecycle Incremental value for incremental effort - Think Globally, Act Locally  Design Principles Data Sharing & Data Citation
Theory Data Sharing & Data Citation
Theory + Data Sharing & Data Citation Data citations should be first class objects for publication -- appear with citation; should be as easy to cite as other works At minimum, all data necessary to understand assess extend conclusions in scholarly work should be cited Citations should persist and enable access to fixed version of data at least as long as citing work Data citation should support unambiguous attribution of credit to all contributors, possibly through the citation ecosystem
Theory + Practice Data Sharing & Data Citation
Use Cases Data Sharing & Data Citation
Use Cases (details) Data Sharing & Data Citation Operational Constraints? -Syntax -Interoperability -Technical contexts of use
Actors Data Sharing & Data Citation
Semantic : Persistent ID, Author, Title, Version (or at least date) Presentation : Any style Grouped with other references Actionable in context Policy Treat  data cites as first class If its needed support a claim, cite it Offer credit to contributors Simple Proposal Data Sharing & Data Citation
We cannot depend on a single tool -- plans for integration and interoperability through  citations  and linking mechanisms, interchange formats, ontology hooks, protocols ? Large portion of benefit from data sharing arises from  open   access … -- how can OpenShare “nudge” researchers toward Open Data? Individual researchers cannot ensure  long-term  access  -- how will OpenShapa fit in institutional ecosystem? Discussion
Contact Micah Altman futurelib.org Data Sharing & Data Citation

Data Sharing & Data Citation

  • 1.
    Data Sharing &Data Citation Micah Altman, Institute for Quantitative Social Science, Harvard University Prepared for data coding, analysis, archiving, and sharing for open collaboration NSF Sept 15-16, 2011
  • 2.
    Collaborators* Margaret Adams,George Alter, Leonid Andreev, Ed Bachman, Adam Buchbinder, Ken Bollen, Bryan Beecher, Steve Burling, Tom Carsey, Kevin Condon, Jonathan Crabtree, Merce Crosas, Darrell Donakowski, Myron Guttman, Gary King, Patrick King, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Amy Pienta, Lois Timms-Ferrarra, Akio Sone, Bob Treacy, Copeland Young Research Support Thanks to the Library of Congress (PA#NDP03-1), the National Science Foundation (DMS-0835500, SES 0112072), IMLS (LG-05-09-0041-09), the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive. Data Sharing & Data Citation * And co-conspirators
  • 3.
    Related Work Altman,M., and J. Crabtree, 2011. “Using the SafeArchive System: TRAC-Based Auditing of LOCKSS”, Proceedings of Archiving 2011. M. Crosas, 2011, “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data”, D-Lib Magazine 17(1/2). M. Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences." The American Archivist . 72(1): 169-182 Gutmann,M. Abrahamson, M, Adams, M.O., Altman, M, Arms, C., Bollen, K., Carlson, M., Crabtree, J., Donakowski, D., King, G., Lyle, J., Maynard, M., Pienta, A., Rockwell, R, Timms-Ferrara L., Young, C., 2009. "From Preserving the Past to Preserving the Future: The Data-PASS Project and the challenges of preserving digital social science data", Library Trends 57(3):315-33 M. Altman, 2008, "A Fingerprint Method for Verification of Scientific Data" in, Advances in Systems, Computing Sciences and Software Engineering , (Proceedings of the International Conference on Systems, Computing Sciences and Software Engineering 2007) , Springer Verlag. M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib, 13, 3/4 (March/April). G. King, 2007, " An Introduction to the Dataverse Network as an Infrastructure for Data Sharing", Sociological Methods and Research , Vol. 32, No. 2, pp. 173-199 Data Sharing & Data Citation
  • 4.
    Motivations Data Sharing& Data Citation
  • 5.
    Access to Datais the Foundation of Science Science is not (only) about being scientific Scientific progress requires community: competition and collaboration in the pursuit of common goals Without access to the same materials: no community exists … data is the nucleus of scientific collaboration The value of an article that can’t be replicated: ? Scholarly articles are summaries, not the actual research results Experimental expensive to reproduce, observational data impossible Hard for journal editors to verify -- If you find it, how do you know it’s the same? Replication projects show: many published articles cannot be replicated … data is needed for scientific replication Data Sharing & Data Citation Sources: Fienberg et. al 1985; ICSU 2004; Nature 2009
  • 6.
    Open Data Broadens& Deepens Impact Data Intensive Science Increased opportunities for interdisciplinarity Science modeling across multiple scales Continuous, complete, fine-grained information on physical processes, systems, human behavior Education Data eases transition from education to research Open Data Democratizes Science Citizen-scientist Developing countries Researchers outside of inner-circle of institution Crowd-sourcing, open notebooks, and mashups Data Sharing & Data Citation & Data Sharing Increases Publication Impact [Gleditsch 2003; Wilson 2008; Piowar 2007]
  • 7.
    Data is Keyto Government Statistics = state-istics Reformers use data To assess the performance of the state To assess social conditions Governments attempt to control access to data to evade accountability Policy debates often centers on data War on poverty, civil rights, consumer protection – all made heavy use of statistical arguments Economic, environment policies are data-intensive Data access brings together both sides of political spectrum In modern democracy the public needs a direct source of information Liberals and conservatives support access to data informing policy Data Sharing & Data Citation Source: “Propaganda” http://www.media-studies.ca/articles/images/berlin_wall.jpg Sources: Gough 2003; Shulman 2006; Wagner & Steinzor 2006; Alonzo and Starr 1988
  • 8.
    Open Data is“Research Insurance” Keeps open option to after nominal end of project – extends lifecycle Continuation projects Publication revisions Broader research programs Insures against loss of “project memory” Departure of a senior personnel from institution Departure of post-docs, graduate students from students Accidental loss of data due to local IT failures Reduces questions from secondary analysts Insures against intentional and unintentional errors All collaborators can verify results prior to publication Enables more intensive peer review Data Sharing & Data Citation Source: Berman, et. al 2008.
  • 9.
    Data Sharing AcrossCommunities Data sharing practices vary greatly across communities Proprietary Formal sharing Formal deposit Significant correlates: Tacit knowledge, Individual investment of time in data collection, confidentiality, journal practices, funder policies & practices [Micah Altman, 10/6/2009] Open Data Source: R.I.N. 2008 also see Borgman 2007; Niu 2006
  • 10.
    So when dothings go wrong? Source: Reich & Rosenthal 2005
  • 11.
    Confidentiality Restrictions forPersonal Private Information Overlapping laws differ: People/subjects covered Organizations covered Required technical and procedural controls Definition of identifiability Some Strategies Consent for sharing up front Commercialize Observe public activity Share aggregates only De-identify Recent Statistical Results (Oversimplified  ) De-identification often leaks Aggregation sometimes leaks Not included : EU directives, foreign laws, ANPRM Request for Comment on proposed revisions to 45 CFR 46 www.hhs.gov/ohrp/humansubjects/anprm2011page.html
  • 12.
    Integrating Tools DataSharing & Data Citation
  • 13.
    Data Management -Goals Data Sharing & Data Citation
  • 14.
    Data Management ElementsData Sharing & Data Citation
  • 15.
    Core Requirements forData Sharing Infrastructure Stakeholder incentives recognition; citation; payment; compliance; services Dissemination access to metadata; documentation; data Access control authentication; authorization; rights management Provenance chain of control; verification of metadata, bits, semantic content Persistence bits; semantic content; use Legal protection rights management; consent; record keeping; auditing Usability discovery; deposit; curation; administration; collaboration Business model Data Sharing & Data Citation Sources: King 2007; ICSU 2004; NSB 2005
  • 16.
    Why is Infrastructurefor Data Sharing Necessary? Accessibility: Many large data sets: in public archives Most data in published articles: not accessible, results not replicable without the original author Most data sets from federal grants: not publicly available Problems with discovery and linking even with professional archives: Data in different archives have different identifiers Archives change identifiers, links Changes to data are made; identifiers are reused or removed; old data are lost Locating/browsing/extracting requires specialized tools & approaches Sharing data requires exposing tacit knowledge Explicit documentation of data structure, collection process, interpretation Harmonizing/linking to known ontologies, metadata schemas, vocabularies Data sets are not preserved like books Static data files (even if on the web): unreadable after a few years When storage methods change: some data sets are lost; others have altered content! Why not Single Centralized infrastructure ? Single point of failure Difficult when data are heterogeneous in format, origin, size, effort needed to collect or analyze, legal access rules, etc. Data producers want credit, control, and visibility Data Sharing & Data Citation
  • 17.
    Dataverse For OrganizationsFor Scholars Brand it like your own website. Upload any type of data. Establish a persistent data citation Facilitate data discovery Provide live analysis Receive permanent storage space Used by archives, libraries, journals, schools Enable contributors to upload data Organize studies by collections Search across a universe of data Control access and terms of use Federate with catalogs and partners: 
OAI-PMH, LOCKSS, Z39.50, DDI Gateway to over 39000 social science studies (world’s largest catalog) Web Virtual Hosting 2.0 Service -- Over 350 virtual archives Federated search and delivery
  • 18.
    Virtual Archive: ScholarSite Scholar retains control over branding and dissemination Preservation and long-term access is guaranteed Dissemination and compliance with Data Manage Plans is verifiable Integrates with OpenScholar Data Sharing & Data Citation
  • 19.
  • 20.
    Mind the GapsGAP: Coverage across entire lifecycle -- decoupling of dissemination, formal publication, long-term access, reuse GAP: Interoperability and integration across tools GAP: Maturity and sustainability of tools --- most tools have small communities of maintainers, particular worrisome w/lack of interoperability Data Sharing & Data Citation design publishing dissemination archiving reuse collection processing integration analysis cati / capi Enhanced publication (sweave) identifiers Google-__________ data archives, hosting, networks General digital libraries and repositories Scientific workflow systems
  • 21.
    Supporting Institutions DataSharing & Data Citation
  • 22.
    Institutional Data AccessStrategies* “ Ignore it, maybe someone else will take care of it” (internet archive, …) “ We’ll always be here” (self-preservation) Let the publishers do It “ We are ever true to [Insert Alma Mater]” (institutional archives) “ Ask us (domain archive) to do it” (ICPSR, MRA, Roper, …) “ Ask someone(s) else do it” (Data-PASS, Meta-Archive, ClockSS) “ Trust No One” (LOCKSS) Data Sharing & Data Citation *All quotes are entirely fictional :-)
  • 23.
    Institutional Preservation Strategies-- Corollaries There are potential single points of failure in both technology, organization and legal regimes: Diversify your portfolio: multiple software systems, hardware, organization (e.g., Data-PASS :-) Seek international partners Many combinations of preservation & dissemination strategies are compatible: Layer technologies and strategies Leverage dissemination (in a planned way) for preservation (and vice-versa) Preservation is impossible to demonstrate conclusively: Consider organizational credentials No organization is absolutely certain to be reliable Data Sharing & Data Citation
  • 24.
    Partnership Agreements MOUSecession Plans & Agreements Coordinating Operations Development of shared procedures Joint “ Not-bad ” practices Identification & selection Metadata Confidentiality Shared Catalog Unified Discovery Content replication Data-PASS is a broad-based partnership of data archives dedicated to acquiring and preserving data at-risk of being lost to the social science research community. Data-PASS partners have rescued thousands of data sets and created the largest catalog of social science data in existence. Data-PASS partners collaborate to identify and promote good archival practices, seek out at-risk research data, build preservation infrastructure, and mutually safeguard Data Sharing & Data Citation
  • 25.
    Ideal integration ofpolicy and technology? Expressed in high-level domain/business language Captures a significant portion of business domain Translated to a formal schematization Automatically measurable Directly controls procedures & actions to achieve compliance Verifiable translation from business domain policy Data Sharing & Data Citation Policy: A set of rules and objectives expressed at a high level domain that controls actions at a lower level
  • 26.
    Data Sharing &Data Citation “ The repository system must be able to identify the number of copies of all stored digital objects, and the location of each object and their copies.” Policy Schematization Behavior (Operationalization)
  • 27.
    SafeArchive: TRAC-BasedManagement of LOCKSS Facilitating collaborative replication and preservation with technology… Collaborators declare explicit non-uniform resource commitments Policy records commitments, storage network properties Storage layer provides replication, integrity, freshness, versioning SafeArchive software provides monitoring, auditing, and provisioning Content is harvested through HTTP (LOCKSS) or OAI-PMH Integration of LOCKSS, The Dataverse Network, TRAC Data Sharing & Data Citation
  • 28.
    Aligning Incentives DataSharing & Data Citation
  • 29.
    Stakeholders & InformationFlow Data Sharing & Data Citation Data Collection Publication of Research Products
  • 30.
    Data Citation asa Leverage Point Services Identifiers to specific fixed versions of data are needed to establish unambiguous chains of provenance Identifiers that can be globally resolved to machine-understandable metadata and to identified object are needed to building generalized access and analysis services Persistence of identifiers are needed to maintain long-term access Incentives Scholarly credit (intellectual attribution) is a large motivator for many researchers – citation creates incentive for researchers to publish data Scholars also comply with enforceable journal policies -- requiring data citation is a light-weight method to make data access policies auditable Impact/usage is a motivator for public research funders – data citation provides foundation for measures of usage and impact Data Sharing & Data Citation
  • 31.
    Data Sharing &Data Citation Common Principles
  • 32.
    Data Sharing &Data Citation
  • 33.
    Thanks to 37Participants Data Sharing & Data Citation
  • 34.
    What is acitation? Data Sharing & Data Citation
  • 35.
    Data Sharing &Data Citation
  • 36.
    Workflow Data Sharing& Data Citation
  • 37.
    Workflow Data Sharing& Data Citation
  • 38.
    - Separatescientific principles, use cases, requirements Distinguish syntax, semantics, from presentation Design for ecosystem & lifecycle Incremental value for incremental effort - Think Globally, Act Locally Design Principles Data Sharing & Data Citation
  • 39.
    Theory Data Sharing& Data Citation
  • 40.
    Theory + DataSharing & Data Citation Data citations should be first class objects for publication -- appear with citation; should be as easy to cite as other works At minimum, all data necessary to understand assess extend conclusions in scholarly work should be cited Citations should persist and enable access to fixed version of data at least as long as citing work Data citation should support unambiguous attribution of credit to all contributors, possibly through the citation ecosystem
  • 41.
    Theory + PracticeData Sharing & Data Citation
  • 42.
    Use Cases DataSharing & Data Citation
  • 43.
    Use Cases (details)Data Sharing & Data Citation Operational Constraints? -Syntax -Interoperability -Technical contexts of use
  • 44.
    Actors Data Sharing& Data Citation
  • 45.
    Semantic : PersistentID, Author, Title, Version (or at least date) Presentation : Any style Grouped with other references Actionable in context Policy Treat data cites as first class If its needed support a claim, cite it Offer credit to contributors Simple Proposal Data Sharing & Data Citation
  • 46.
    We cannot dependon a single tool -- plans for integration and interoperability through citations and linking mechanisms, interchange formats, ontology hooks, protocols ? Large portion of benefit from data sharing arises from open access … -- how can OpenShare “nudge” researchers toward Open Data? Individual researchers cannot ensure long-term access -- how will OpenShapa fit in institutional ecosystem? Discussion
  • 47.
    Contact Micah Altmanfuturelib.org Data Sharing & Data Citation

Editor's Notes

  • #2 This work by Micah Altman (http://redistricting.info) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.