Data Sharing & Data Citation Micah Altman, Institute for Quantitative Social Science, Harvard University Prepared for  dat...
Collaborators* <ul><li>Margaret Adams, George Alter, Leonid Andreev, Ed Bachman,  Adam Buchbinder,  Ken Bollen, Bryan Beec...
Related Work <ul><li>Altman, M., and J. Crabtree, 2011.  “Using the SafeArchive System: TRAC-Based Auditing of LOCKSS”,  P...
<ul><li>Motivations </li></ul>Data Sharing & Data Citation
Access to Data is the Foundation of Science <ul><li>Science is not (only) about being scientific </li></ul><ul><li>Scienti...
Open Data Broadens & Deepens Impact <ul><li>Data Intensive Science </li></ul><ul><ul><li>Increased opportunities for inter...
Data is Key to Government <ul><li>Statistics = state-istics </li></ul><ul><li>Reformers use data </li></ul><ul><ul><li>To ...
Open Data is “Research Insurance” <ul><li>Keeps open option to after nominal end of project – extends lifecycle </li></ul>...
Data Sharing Across Communities <ul><li>Data sharing practices vary greatly across communities </li></ul><ul><ul><li>Propr...
So when do things go wrong? Source: Reich & Rosenthal 2005
Confidentiality Restrictions for Personal Private Information <ul><li>Overlapping laws differ: </li></ul><ul><ul><li>Peopl...
<ul><li>Integrating Tools </li></ul>Data Sharing & Data Citation
Data Management - Goals Data Sharing & Data Citation
Data Management Elements Data Sharing & Data Citation
Core Requirements for Data Sharing Infrastructure <ul><li>Stakeholder incentives  </li></ul><ul><ul><li>recognition; citat...
Why is Infrastructure for Data Sharing Necessary? <ul><li>Accessibility: </li></ul><ul><ul><li>Many large data sets: in pu...
Dataverse For Organizations For Scholars <ul><li>Brand it like your own website. </li></ul><ul><li>Upload any type of data...
Virtual Archive: Scholar Site <ul><li>Scholar retains control over branding and dissemination </li></ul><ul><li>Preservati...
Interoperability & Integration
Mind the Gaps <ul><li>GAP: Coverage across entire lifecycle   -- decoupling of dissemination, formal publication, long-ter...
<ul><li>Supporting Institutions </li></ul>Data Sharing & Data Citation
Institutional Data Access Strategies* <ul><li>“ Ignore it, maybe someone else will take care of it”  </li></ul><ul><ul><li...
Institutional Preservation Strategies -- Corollaries <ul><li>There are potential single points of failure in both technolo...
<ul><li>Partnership Agreements </li></ul><ul><ul><li>MOU </li></ul></ul><ul><ul><li>Secession Plans & Agreements </li></ul...
Ideal integration of policy and technology?  <ul><li>Expressed in high-level domain/business language </li></ul><ul><li>Ca...
Data Sharing & Data Citation “ The repository system must be able to identify the number of copies of all stored digital o...
SafeArchive:  TRAC-Based Management of LOCKSS  <ul><li>Facilitating collaborative replication and preservation with techno...
Aligning Incentives Data Sharing & Data Citation
Stakeholders & Information Flow Data Sharing & Data Citation Data Collection Publication of  Research Products
Data Citation as a Leverage Point  <ul><li>Services </li></ul><ul><ul><li>Identifiers to specific fixed versions of data a...
Data Sharing & Data Citation Common Principles
Data Sharing & Data Citation
Thanks to 37 Participants Data Sharing & Data Citation
<ul><li>What is a citation? </li></ul>Data Sharing & Data Citation
Data Sharing & Data Citation
Workflow Data Sharing & Data Citation
Workflow Data Sharing & Data Citation
<ul><li>-  Separate scientific principles, use cases, requirements </li></ul><ul><li>Distinguish syntax, semantics, from p...
Theory Data Sharing & Data Citation
Theory + Data Sharing & Data Citation <ul><li>Data citations should be first class objects for publication -- appear with ...
Theory + Practice Data Sharing & Data Citation
Use Cases Data Sharing & Data Citation
Use Cases (details) Data Sharing & Data Citation Operational Constraints? -Syntax -Interoperability -Technical contexts of...
Actors Data Sharing & Data Citation
<ul><li>Semantic : Persistent ID, Author, Title, Version (or at least date) </li></ul><ul><li>Presentation : Any style Gro...
<ul><li>We cannot depend on a single tool -- plans for integration and interoperability through  citations  and linking me...
Contact <ul><li>Micah Altman </li></ul><ul><li>futurelib.org </li></ul>Data Sharing & Data Citation
Upcoming SlideShare
Loading in …5
×

Data Sharing & Data Citation

2,587 views

Published on

Prepared for data coding, analysis, archiving, and sharing for open collaboration; National Science Foundation, Sept 15-16, 2011

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,587
On SlideShare
0
From Embeds
0
Number of Embeds
1,026
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • This work by Micah Altman (http://redistricting.info) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  • Data Sharing & Data Citation

    1. 1. Data Sharing & Data Citation Micah Altman, Institute for Quantitative Social Science, Harvard University Prepared for data coding, analysis, archiving, and sharing for open collaboration NSF Sept 15-16, 2011
    2. 2. Collaborators* <ul><li>Margaret Adams, George Alter, Leonid Andreev, Ed Bachman, Adam Buchbinder, Ken Bollen, Bryan Beecher, Steve Burling, Tom Carsey, Kevin Condon, Jonathan Crabtree, Merce Crosas, Darrell Donakowski, Myron Guttman, Gary King, Patrick King, Tom Lipkis, Freeman Lo, Jared Lyle, Marc Maynard, Nancy McGovern, Amy Pienta, Lois Timms-Ferrarra, Akio Sone, Bob Treacy, Copeland Young </li></ul><ul><li>Research Support </li></ul><ul><ul><li>Thanks to the Library of Congress (PA#NDP03-1), the National Science Foundation (DMS-0835500, SES 0112072), IMLS (LG-05-09-0041-09), the Harvard University Library, the Institute for Quantitative Social Science, the Harvard-MIT Data Center, and the Murray Research Archive. </li></ul></ul>Data Sharing & Data Citation * And co-conspirators
    3. 3. Related Work <ul><li>Altman, M., and J. Crabtree, 2011. “Using the SafeArchive System: TRAC-Based Auditing of LOCKSS”, Proceedings of Archiving 2011. </li></ul><ul><li>M. Crosas, 2011, “The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data”, D-Lib Magazine 17(1/2). </li></ul><ul><li>M. Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. &quot;Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences.&quot; The American Archivist . 72(1): 169-182 </li></ul><ul><li>Gutmann,M. Abrahamson, M, Adams, M.O., Altman, M, Arms, C., Bollen, K., Carlson, M., Crabtree, J., Donakowski, D., King, G., Lyle, J., Maynard, M., Pienta, A., Rockwell, R, Timms-Ferrara L., Young, C., 2009. &quot;From Preserving the Past to Preserving the Future: The Data-PASS Project and the challenges of preserving digital social science data&quot;, Library Trends 57(3):315-33 </li></ul><ul><li>M. Altman, 2008, &quot;A Fingerprint Method for Verification of Scientific Data&quot; in, Advances in Systems, Computing Sciences and Software Engineering , (Proceedings of the International Conference on Systems, Computing Sciences and Software Engineering 2007) , Springer Verlag. </li></ul><ul><li>M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib, 13, 3/4 (March/April). </li></ul><ul><li>G. King, 2007, &quot; An Introduction to the Dataverse Network as an Infrastructure for Data Sharing&quot;, Sociological Methods and Research , Vol. 32, No. 2, pp. 173-199 </li></ul>Data Sharing & Data Citation
    4. 4. <ul><li>Motivations </li></ul>Data Sharing & Data Citation
    5. 5. Access to Data is the Foundation of Science <ul><li>Science is not (only) about being scientific </li></ul><ul><li>Scientific progress requires community: competition and collaboration in the pursuit of common goals </li></ul><ul><li>Without access to the same materials: no community exists … data is the nucleus of scientific collaboration </li></ul><ul><li>The value of an article that can’t be replicated: ? </li></ul><ul><li>Scholarly articles are summaries, not the actual research results </li></ul><ul><li>Experimental expensive to reproduce, observational data impossible </li></ul><ul><li>Hard for journal editors to verify -- If you find it, how do you know it’s the same? </li></ul><ul><li>Replication projects show: many published articles cannot be replicated … data is needed for scientific replication </li></ul>Data Sharing & Data Citation Sources: Fienberg et. al 1985; ICSU 2004; Nature 2009
    6. 6. Open Data Broadens & Deepens Impact <ul><li>Data Intensive Science </li></ul><ul><ul><li>Increased opportunities for interdisciplinarity </li></ul></ul><ul><ul><li>Science modeling across multiple scales </li></ul></ul><ul><ul><li>Continuous, complete, fine-grained information on physical processes, systems, human behavior </li></ul></ul><ul><li>Education </li></ul><ul><ul><li>Data eases transition from education to research </li></ul></ul><ul><li>Open Data Democratizes Science </li></ul><ul><ul><li>Citizen-scientist </li></ul></ul><ul><ul><li>Developing countries </li></ul></ul><ul><ul><li>Researchers outside of inner-circle of institution </li></ul></ul><ul><ul><li>Crowd-sourcing, open notebooks, and mashups </li></ul></ul>Data Sharing & Data Citation & Data Sharing Increases Publication Impact [Gleditsch 2003; Wilson 2008; Piowar 2007]
    7. 7. Data is Key to Government <ul><li>Statistics = state-istics </li></ul><ul><li>Reformers use data </li></ul><ul><ul><li>To assess the performance of the state </li></ul></ul><ul><ul><li>To assess social conditions </li></ul></ul><ul><ul><li>Governments attempt to control access to data to evade accountability </li></ul></ul><ul><li>Policy debates often centers on data </li></ul><ul><ul><li>War on poverty, civil rights, consumer protection – all made heavy use of statistical arguments </li></ul></ul><ul><ul><li>Economic, environment policies are data-intensive </li></ul></ul><ul><li>Data access brings together both sides of political spectrum </li></ul><ul><ul><li>In modern democracy the public needs a direct source of information </li></ul></ul><ul><ul><li>Liberals and conservatives support access to data informing policy </li></ul></ul>Data Sharing & Data Citation Source: “Propaganda” http://www.media-studies.ca/articles/images/berlin_wall.jpg Sources: Gough 2003; Shulman 2006; Wagner & Steinzor 2006; Alonzo and Starr 1988
    8. 8. Open Data is “Research Insurance” <ul><li>Keeps open option to after nominal end of project – extends lifecycle </li></ul><ul><ul><li>Continuation projects </li></ul></ul><ul><ul><li>Publication revisions </li></ul></ul><ul><ul><li>Broader research programs </li></ul></ul><ul><li>Insures against loss of “project memory” </li></ul><ul><ul><li>Departure of a senior personnel from institution </li></ul></ul><ul><ul><li>Departure of post-docs, graduate students from students </li></ul></ul><ul><ul><li>Accidental loss of data due to local IT failures </li></ul></ul><ul><ul><li>Reduces questions from secondary analysts </li></ul></ul><ul><li>Insures against intentional and unintentional errors </li></ul><ul><ul><li>All collaborators can verify results prior to publication </li></ul></ul><ul><ul><li>Enables more intensive peer review </li></ul></ul>Data Sharing & Data Citation Source: Berman, et. al 2008.
    9. 9. Data Sharing Across Communities <ul><li>Data sharing practices vary greatly across communities </li></ul><ul><ul><li>Proprietary </li></ul></ul><ul><ul><li>Formal sharing </li></ul></ul><ul><ul><li>Formal deposit </li></ul></ul><ul><ul><li>Significant correlates: Tacit knowledge, Individual investment of time in data collection, confidentiality, journal practices, funder policies & practices </li></ul></ul>[Micah Altman, 10/6/2009] Open Data Source: R.I.N. 2008 also see Borgman 2007; Niu 2006
    10. 10. So when do things go wrong? Source: Reich & Rosenthal 2005
    11. 11. Confidentiality Restrictions for Personal Private Information <ul><li>Overlapping laws differ: </li></ul><ul><ul><li>People/subjects covered </li></ul></ul><ul><ul><li>Organizations covered </li></ul></ul><ul><ul><li>Required technical and procedural controls </li></ul></ul><ul><ul><li>Definition of identifiability </li></ul></ul><ul><li>Some Strategies </li></ul><ul><ul><li>Consent for sharing up front </li></ul></ul><ul><ul><li>Commercialize </li></ul></ul><ul><ul><li>Observe public activity </li></ul></ul><ul><ul><li>Share aggregates only </li></ul></ul><ul><ul><li>De-identify </li></ul></ul><ul><li>Recent Statistical Results (Oversimplified  ) </li></ul><ul><ul><li>De-identification often leaks </li></ul></ul><ul><ul><li>Aggregation sometimes leaks </li></ul></ul>Not included : EU directives, foreign laws, ANPRM Request for Comment on proposed revisions to 45 CFR 46 www.hhs.gov/ohrp/humansubjects/anprm2011page.html
    12. 12. <ul><li>Integrating Tools </li></ul>Data Sharing & Data Citation
    13. 13. Data Management - Goals Data Sharing & Data Citation
    14. 14. Data Management Elements Data Sharing & Data Citation
    15. 15. Core Requirements for Data Sharing Infrastructure <ul><li>Stakeholder incentives </li></ul><ul><ul><li>recognition; citation; payment; compliance; services </li></ul></ul><ul><li>Dissemination </li></ul><ul><ul><li>access to metadata; documentation; data </li></ul></ul><ul><li>Access control </li></ul><ul><ul><li>authentication; authorization; rights management </li></ul></ul><ul><li>Provenance </li></ul><ul><ul><li>chain of control; verification of metadata, bits, semantic content </li></ul></ul><ul><li>Persistence </li></ul><ul><ul><li>bits; semantic content; use </li></ul></ul><ul><li>Legal protection </li></ul><ul><ul><li>rights management; consent; record keeping; auditing </li></ul></ul><ul><li>Usability </li></ul><ul><ul><li>discovery; deposit; curation; administration; collaboration </li></ul></ul><ul><li>Business model </li></ul>Data Sharing & Data Citation Sources: King 2007; ICSU 2004; NSB 2005
    16. 16. Why is Infrastructure for Data Sharing Necessary? <ul><li>Accessibility: </li></ul><ul><ul><li>Many large data sets: in public archives </li></ul></ul><ul><ul><li>Most data in published articles: not accessible, results not replicable without the original author </li></ul></ul><ul><ul><li>Most data sets from federal grants: not publicly available </li></ul></ul><ul><li>Problems with discovery and linking even with professional archives: </li></ul><ul><ul><li>Data in different archives have different identifiers </li></ul></ul><ul><ul><li>Archives change identifiers, links </li></ul></ul><ul><ul><li>Changes to data are made; identifiers are reused or removed; old data are lost </li></ul></ul><ul><ul><li>Locating/browsing/extracting requires specialized tools & approaches </li></ul></ul><ul><li>Sharing data requires exposing tacit knowledge </li></ul><ul><ul><li>Explicit documentation of data structure, collection process, interpretation </li></ul></ul><ul><ul><li>Harmonizing/linking to known ontologies, metadata schemas, vocabularies </li></ul></ul><ul><li>Data sets are not preserved like books </li></ul><ul><ul><li>Static data files (even if on the web): unreadable after a few years </li></ul></ul><ul><ul><li>When storage methods change: some data sets are lost; others have altered content! </li></ul></ul><ul><li>Why not Single Centralized infrastructure ? </li></ul><ul><ul><li>Single point of failure </li></ul></ul><ul><ul><li>Difficult when data are heterogeneous in format, origin, size, effort needed to collect or analyze, legal access rules, etc. </li></ul></ul><ul><ul><li>Data producers want credit, control, and visibility </li></ul></ul>Data Sharing & Data Citation
    17. 17. Dataverse For Organizations For Scholars <ul><li>Brand it like your own website. </li></ul><ul><li>Upload any type of data. </li></ul><ul><li>Establish a persistent data citation </li></ul><ul><li>Facilitate data discovery </li></ul><ul><li>Provide live analysis </li></ul><ul><li>Receive permanent storage space </li></ul><ul><li>Used by archives, libraries, journals, schools </li></ul><ul><li>Enable contributors to upload data </li></ul><ul><li>Organize studies by collections </li></ul><ul><li>Search across a universe of data </li></ul><ul><li>Control access and terms of use </li></ul><ul><li>Federate with catalogs and partners: 
OAI-PMH, LOCKSS, Z39.50, DDI </li></ul><ul><li>Gateway to over 39000 social science studies (world’s largest catalog) </li></ul><ul><li>Web Virtual Hosting 2.0 Service -- Over 350 virtual archives </li></ul><ul><li>Federated search and delivery </li></ul>
    18. 18. Virtual Archive: Scholar Site <ul><li>Scholar retains control over branding and dissemination </li></ul><ul><li>Preservation and long-term access is guaranteed </li></ul><ul><li>Dissemination and compliance with Data Manage Plans is verifiable </li></ul><ul><li>Integrates with OpenScholar </li></ul>Data Sharing & Data Citation
    19. 19. Interoperability & Integration
    20. 20. Mind the Gaps <ul><li>GAP: Coverage across entire lifecycle -- decoupling of dissemination, formal publication, long-term access, reuse </li></ul><ul><li>GAP: Interoperability and integration across tools </li></ul><ul><li>GAP: Maturity and sustainability of tools --- most tools have small communities of maintainers, particular worrisome w/lack of interoperability </li></ul>Data Sharing & Data Citation design publishing dissemination archiving reuse collection processing integration analysis cati / capi Enhanced publication (sweave) identifiers Google-__________ data archives, hosting, networks General digital libraries and repositories Scientific workflow systems
    21. 21. <ul><li>Supporting Institutions </li></ul>Data Sharing & Data Citation
    22. 22. Institutional Data Access Strategies* <ul><li>“ Ignore it, maybe someone else will take care of it” </li></ul><ul><ul><li>(internet archive, …) </li></ul></ul><ul><li>“ We’ll always be here” </li></ul><ul><ul><li>(self-preservation) </li></ul></ul><ul><li>Let the publishers do It </li></ul><ul><li>“ We are ever true to [Insert Alma Mater]” </li></ul><ul><ul><li>(institutional archives) </li></ul></ul><ul><li>“ Ask us (domain archive) to do it” </li></ul><ul><ul><li>(ICPSR, MRA, Roper, …) </li></ul></ul><ul><li>“ Ask someone(s) else do it” </li></ul><ul><ul><li>(Data-PASS, Meta-Archive, ClockSS) </li></ul></ul><ul><li>“ Trust No One” </li></ul><ul><ul><li>(LOCKSS) </li></ul></ul>Data Sharing & Data Citation *All quotes are entirely fictional :-)
    23. 23. Institutional Preservation Strategies -- Corollaries <ul><li>There are potential single points of failure in both technology, organization and legal regimes: </li></ul><ul><li>Diversify your portfolio: multiple software systems, hardware, organization (e.g., Data-PASS :-) </li></ul><ul><li>Seek international partners </li></ul><ul><li>Many combinations of preservation & dissemination strategies are compatible: </li></ul><ul><li>Layer technologies and strategies </li></ul><ul><li>Leverage dissemination (in a planned way) for preservation (and vice-versa) </li></ul><ul><li>Preservation is impossible to demonstrate conclusively: </li></ul><ul><li>Consider organizational credentials </li></ul><ul><li>No organization is absolutely certain to be reliable </li></ul>Data Sharing & Data Citation
    24. 24. <ul><li>Partnership Agreements </li></ul><ul><ul><li>MOU </li></ul></ul><ul><ul><li>Secession Plans & Agreements </li></ul></ul><ul><li>Coordinating Operations </li></ul><ul><ul><li>Development of shared procedures </li></ul></ul><ul><li>Joint “ Not-bad ” practices </li></ul><ul><ul><li>Identification & selection </li></ul></ul><ul><ul><li>Metadata </li></ul></ul><ul><ul><li>Confidentiality </li></ul></ul><ul><li>Shared Catalog </li></ul><ul><ul><li>Unified Discovery </li></ul></ul><ul><ul><li>Content replication </li></ul></ul>Data-PASS is a broad-based partnership of data archives dedicated to acquiring and preserving data at-risk of being lost to the social science research community. Data-PASS partners have rescued thousands of data sets and created the largest catalog of social science data in existence. Data-PASS partners collaborate to identify and promote good archival practices, seek out at-risk research data, build preservation infrastructure, and mutually safeguard Data Sharing & Data Citation
    25. 25. Ideal integration of policy and technology? <ul><li>Expressed in high-level domain/business language </li></ul><ul><li>Captures a significant portion of business domain </li></ul><ul><li>Translated to a formal schematization </li></ul><ul><li>Automatically measurable </li></ul><ul><li>Directly controls procedures & actions to achieve compliance </li></ul><ul><li>Verifiable translation from business domain policy </li></ul>Data Sharing & Data Citation Policy: A set of rules and objectives expressed at a high level domain that controls actions at a lower level
    26. 26. Data Sharing & Data Citation “ The repository system must be able to identify the number of copies of all stored digital objects, and the location of each object and their copies.” Policy Schematization Behavior (Operationalization)
    27. 27. SafeArchive: TRAC-Based Management of LOCKSS <ul><li>Facilitating collaborative replication and preservation with technology… </li></ul><ul><li>Collaborators declare explicit non-uniform resource commitments </li></ul><ul><li>Policy records commitments, storage network properties </li></ul><ul><li>Storage layer provides replication, integrity, freshness, versioning </li></ul><ul><li>SafeArchive software provides monitoring, auditing, and provisioning </li></ul><ul><li>Content is harvested through HTTP (LOCKSS) or OAI-PMH </li></ul><ul><li>Integration of LOCKSS, The Dataverse Network, TRAC </li></ul>Data Sharing & Data Citation
    28. 28. Aligning Incentives Data Sharing & Data Citation
    29. 29. Stakeholders & Information Flow Data Sharing & Data Citation Data Collection Publication of Research Products
    30. 30. Data Citation as a Leverage Point <ul><li>Services </li></ul><ul><ul><li>Identifiers to specific fixed versions of data are needed to establish unambiguous chains of provenance </li></ul></ul><ul><ul><li>Identifiers that can be globally resolved to machine-understandable metadata and to identified object are needed to building generalized access and analysis services </li></ul></ul><ul><ul><li>Persistence of identifiers are needed to maintain long-term access </li></ul></ul><ul><li>Incentives </li></ul><ul><ul><li>Scholarly credit (intellectual attribution) is a large motivator for many researchers – citation creates incentive for researchers to publish data </li></ul></ul><ul><ul><li>Scholars also comply with enforceable journal policies -- requiring data citation is a light-weight method to make data access policies auditable </li></ul></ul><ul><ul><li>Impact/usage is a motivator for public research funders – data citation provides foundation for measures of usage and impact </li></ul></ul>Data Sharing & Data Citation
    31. 31. Data Sharing & Data Citation Common Principles
    32. 32. Data Sharing & Data Citation
    33. 33. Thanks to 37 Participants Data Sharing & Data Citation
    34. 34. <ul><li>What is a citation? </li></ul>Data Sharing & Data Citation
    35. 35. Data Sharing & Data Citation
    36. 36. Workflow Data Sharing & Data Citation
    37. 37. Workflow Data Sharing & Data Citation
    38. 38. <ul><li>- Separate scientific principles, use cases, requirements </li></ul><ul><li>Distinguish syntax, semantics, from presentation </li></ul><ul><li>Design for ecosystem & lifecycle </li></ul><ul><li>Incremental value for incremental effort </li></ul><ul><li>- Think Globally, Act Locally </li></ul>Design Principles Data Sharing & Data Citation
    39. 39. Theory Data Sharing & Data Citation
    40. 40. Theory + Data Sharing & Data Citation <ul><li>Data citations should be first class objects for publication -- appear with citation; should be as easy to cite as other works </li></ul><ul><li>At minimum, all data necessary to understand assess extend conclusions in scholarly work should be cited </li></ul><ul><li>Citations should persist and enable access to fixed version of data at least as long as citing work </li></ul><ul><li>Data citation should support unambiguous attribution of credit to all contributors, possibly through the citation ecosystem </li></ul>
    41. 41. Theory + Practice Data Sharing & Data Citation
    42. 42. Use Cases Data Sharing & Data Citation
    43. 43. Use Cases (details) Data Sharing & Data Citation Operational Constraints? -Syntax -Interoperability -Technical contexts of use
    44. 44. Actors Data Sharing & Data Citation
    45. 45. <ul><li>Semantic : Persistent ID, Author, Title, Version (or at least date) </li></ul><ul><li>Presentation : Any style Grouped with other references Actionable in context </li></ul><ul><li>Policy Treat data cites as first class If its needed support a claim, cite it Offer credit to contributors </li></ul>Simple Proposal Data Sharing & Data Citation
    46. 46. <ul><li>We cannot depend on a single tool -- plans for integration and interoperability through citations and linking mechanisms, interchange formats, ontology hooks, protocols ? </li></ul><ul><li>Large portion of benefit from data sharing arises from open access … -- how can OpenShare “nudge” researchers toward Open Data? </li></ul><ul><li>Individual researchers cannot ensure long-term access -- how will OpenShapa fit in institutional ecosystem? </li></ul>Discussion
    47. 47. Contact <ul><li>Micah Altman </li></ul><ul><li>futurelib.org </li></ul>Data Sharing & Data Citation

    ×