• Like

Needs for Data Management & Citation Throughout the Information Lifecycle

  • 8,288 views
Uploaded on

Prepared for the NISO Forum: Tracking it Back to the Source: Managing and Citing Research Data. …

Prepared for the NISO Forum: Tracking it Back to the Source: Managing and Citing Research Data.


September 2012

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
8,288
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
17
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  • Most of the different stakeholders have stronger relationships/stakes with research at different stages. But researchers and research institutions are in the middle – they have a strong stake in most stagesResearchers are more directly concerned with collection, processing, analysis, dissemination. Organizations have a higher stake in internal sharing, re-use, long-term access.
  • This section is an a more detailed deep-dive into drivers at major stages of the information lifecycle. It is not intended to be part of the main presentation – but could be used to respond to questions, or to focus on a particular stage.

Transcript

  • 1. Prepared for NISO Forum: Tracking it Back to the Source: Managing and Citing Research Data September 2012 Needs for Data Management &Citation Throughout the Information Lifecycle Micah Altman Director of Research, MIT Libraries
  • 2. Collaborators and Co-Conspirators• Jonathan Crabtree, Merce Crosas, Gary King, Tom Lipkis, Nancy McGovern, John Willinsky• Research Support – Library of Congress (PA#NDP03-1), – National Science Foundation (DMS-0835500, SES 0112072) – Institute for Museum and Library Services (LG-05-09-0041-09) – Sloan Foundation – Amazon Web Services – Massachusetts Institute of Technology Needs for Data Management & Citation 2
  • 3. Related WorkReprints available from:http://maltman.hmdc.harvard.edu• Altman, M. 2012. Data Citation in The Dataverse Network ®. In P. F. Uhlir (Ed.), Developing Data Attribution and Citation Practices and Standards: Report from an International Workshop (p. Forthcoming). National Academies Press. Forthcoming.• Altman, M., & Crabtree, J. 2011. Using the SafeArchive System : TRAC-Based Auditing of LOCKSS. Archiving 2011 (pp. 165–170). Society for Imaging Science and Technology.• M. Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences." The American Archivist. 72(1): 169-182 M. Altman, 2008, "A Fingerprint Method for Verification of Scientific Data" in, Advances in Systems, Computing Sciences and Software Engineering, (Proceedings of the International Conference on Systems, Computing Sciences and Software Engineering 2007) , Springer-Verlag.• M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib, 13, 3/4 (March/April). Needs for Data Management & Citation 3
  • 4. Preview• Principled approach to data management• Lifecycle data management planning• Lifecycle data management tracking• Lifecycle data management infrastructure• [Exemplar Projects] Needs for Data Management & Citation 4
  • 5. (Some) Timely Challenges Needs for Data Management & Citation 5
  • 6. “Data science is suddenly sexy – does that mean data is the new black?” Needs for Data Management & Citation 6
  • 7. Valuable Data is Lost• Researchers lack Examples archiving capability Intentionally Discarded: “Destroyed, in accord with [nonexistent] APA 5-year post-publication rule.”• Incentives for data Unintentional Hardware Problems “Some data were sharing are weak collected, but the data file was lost in a technical malfunction.” Acts of Nature The data from the studies were on punched cards that were destroyed in a flood in the department in the early 80s.” Discarded or Lost in a Move “As I retired …. Unfortunately, I simply didn’t have the room to store these data sets at my house.” Obsolescence “Speech recordings stored on a LISP Machine…, an experimental computer which is long obsolete.” Simply Lost “For all I know, they are on a [University] server, but it has been literally years and years since the research was done, and my files are long gone.” Research by: Needs for Data Management & Citation 7
  • 8. Unpublished Data Ends up in the “Desk Drawer”• Null results are less likely to be published• Outliers are routinely discarded Daniel Schectman’s Lab Notebook Providing Initial Evidence of Quasi Crystals Needs for Data Management & Citation 8
  • 9. Data Behind Publications Unavailable for Review, Reuse, Replication Needs for Data Management & Citation 9
  • 10. Model Science“Citations to unpublished data and personalcommunications cannot be used to support claims in a published paper”“All data necessary to understand, assess, and extend the conclusions of themanuscript must be available to any reader of Science.” Needs for Data Management & Citation 10
  • 11. Compliance with Policies is Low Compliance is low even in best examples of journals Checking compliance manually is tedious, doesn’t scale Needs for Data Management & Citation 11
  • 12. Special Challenges for Long-Term Access to New Forms of Data• Some Examples – GIS and geospatial trails – Facebook & social networks – Text: blogs, tweets – Cell phone data• Challenges – Proprietary – intellectual Source: [Calberese 2008] property – Size – Dynamic content – Fixity – Format Needs for Data Management & Citation 12
  • 13. A Lifecycle Framework Needs for Data Management & Citation 13
  • 14. “The published article is not scientific output – it’s a summary of scientific output.” -- corollary of Buckheit & Donaho 1995 Needs for Data Management & Citation 14
  • 15. Information Lifecycle Long-term Creation/Collecti access onModeling Re-use • Scientific Storage/I • Educational ngest • Scientometric • Institutional External dissemination/publicati Processing on Internal Analysis Sharing Needs for Data Management & Citation 15
  • 16. Stakeholders Data Consumers Long- Sources/Su Creation/C bjects term ollection access DataModeling Archives/ Storage/ Publisher Re-use Researchers Ingest Research Research Sponsors Organizations External dissemination/ Processing publication Scholarly Internal Analysis Publishers Sharing Service/Infras tructure Needs for Data Management & Citation Providers 16
  • 17. Legal Requirements and Rights Contract Intellectual Property Trade Secret Intellectual Contract Click-Wrap Patent Attribution TOU License Moral RightsModeling Database Rights Journal Funder Open Copyright DMCA Trademar Replication Access k Requirement Fair Use Rights of Common s Publicity Rule HIPAA 45 CFR 26 Privacy FOIA EU Privacy FERPA Torts Directive (Invasion, State Defamation) FOI CIPSEA Potentially Laws State Harmful Privacy Laws (Archeologic al Sites, Classifie Sensitive Animal butd Testing, …) Access EA Confidentiality Unclassifie Rights d R ITAR
  • 18. Stakeholders, Rights and Requirements Contract Intellectual Property Trade Secret Intellectual Contract Click-Wrap Scholarly Patent Publisher Attribution TOU License s Moral RightsModeling Consumers - Secondary research - Participative Science - - Public policy uses Database Rights Journal Funder Open Copyright Infrastructure/Serv DMCA Trademar Replication Access Primary ice Providers k Requirement Fair Use Rights of Researchers Common s Publicity Research HIPAA HIPAA Rule FOIA Organizations 45 CFR 26 Privacy EU Privacy Torts FERPA FERPA Directive (Invasion, State Data Archives CIPSEA Defamation) FOI Laws State Potentially Privacy Laws Harmful Classifie (Archeologic Research al Sites, Sponsors Sensitive Sources/S d Animal but ubjects Testing, …) Access Unclassifie Confidentiality Rights d
  • 19. Stakeholder Drivers per Stage of Information Lifecycle Stage Actors Legal Constraint Concerns Research Subjects - Consent/contract - Public benefit Proposal, - Privacy Design and - Future access to ownModeling Data information Collection Sources - Intellectual - Business confidentiality Property - IP - Contract - Profit from licenses Funder - Open Access - Public benefit - Confidentiality - Policy Relevance - Reproducible Research - Future access Primary - Confidentiality - Publication potential Researcher - Contract - Compliance with - IP institutional/funder requirements Research - Confidentiality - Compliance with funder Institution - Contract requirements - IP Needs for Data Management & Citation - License, IP, confidentiality 19 compliance
  • 20. Stakeholder Drivers per Stage of Information Lifecycle Stage Actors Legal Constraint Concerns Data Storage, Primary - Confidentiality - Publication potential Analysis Researcher - Contract - Compliance with (Pre-publication) - IP institutional/funderModeling requirements Research - Confidentiality - License, IP, Institution - Contract confidentiality - IP compliance - Records management Service - Contract - Contract Providers - (Selected Cases) - Service business Confidentiality model Requirements - Service deployment Needs for Data Management & Citation 20
  • 21. Stakeholder Drivers per Stage of Information Lifecycle Stage Actors Legal Constraint Concerns Publication Primary Compliance for: - Scholarly attribution/credit Researcher - Source/subjects - Promote use of research - Sponsor - Track use/impact of research - Host institutionModeling - Publisher Sponsor - Track research products - Track compliance - Track use/impact Research - Sponsor compliance - Track OA products Institution - Records management - Intellectual property Scholarly - IP - Impact/use /Journal - Contract - Profit/business model Publisher - Replicability Data - IP - Profit/business model Publisher - Replicability - Connection to publication Needs for Data Management & Citation 21
  • 22. Stakeholder Drivers per Stage of Information Lifecycle Stage Actors Legal Constraint Concerns Re(use) Research - Access Rights - Provenance ReaderModeling Secondary - Access rights - Replicability Researcher - Confidentiality - Data reintegration/reanalysis - Contract - Linking publications and data - Provenance “Citizen/Co Access Rights - Data mmunity redissemination/reanalysis Scientist” - Linking publications and data Public Policy Access Rights - Provenance - Replicability - Linking publications and data Education Access Rights - “Classroom” use /teaching - MOOC use Needs for Data Management & Citation 22
  • 23. Lifecycle Management:Data Management Planning Needs for Data Management & Citation 23
  • 24. Some Formal “DMP” Requirements • The Final NIH Statement on Sharing Research Data – was published in the NIH Guide on February 26, 2003. “Starting with the October 1, 2003 receipt date, investigators submitting an NIH application seeking $500,000 or more in direct costs in any single yearPlanning are expected to include a plan for data sharing or state why data sharing is not possible. “ – No later than the main findings from the final data set are accepted for publication • NSF, All proposals must (as of 1/1/2011) include a data management plan. – Specific requirements vague, for the most part: “will be determined by the community of interest through the process of peer review and program management.” • Wellcome Trust: – “ will review data management and sharing plans, and any costs involved in delivering them, as an integral part of the funding decision” Needs for Data Management & Citation 24
  • 25. DMP Goals • Orchestrate data for current use • Control disclosure • Compliance with contracts, regulations, law,Planing and policy • Maximize value of information assets • Ensure short term and long term dissemination Needs for Data Management & Citation 25
  • 26. DMP Elements • Orchestrate data for current use – Data description – Quality Assurance – Data value – Storage, backup, replication, and – Relation to collection versioning – Relation to evidence base – Data Formats – Budget – Data OrganizationPlanning – Budget • Ensure short term and long term – Metadata and documentation dissemination – Data description • Control disclosure – Institutional Archiving Commitments – Access and Sharing – Audience – Intellectual Property Rights – Access and Sharing – Legal Requirements – Data Formats – Security – Data Organization – Metadata and documentation • Compliance with contracts, – Budget regulations, law, and policy – Access and Sharing – Adherence – Responsibility – Ethics and privacy – Security • Value of information assets Needs for Data Management & Citation 26
  • 27. DMP Details • Sharing – Restrictions on use – Plans for depositing in an existing public database • Budget – Access procedures – Cost of preparing data and documentation – Embargo periods – Cost of storage and backup – Access charges – Cost of permanent archiving and access – Timeframe for access • Intellectual Property Rights – Technical access methods – Entities who hold property rights – Restrictions on access – Types of IP rights in data • Long term access – Protections provided (Preservation) – Dispute resolution process –Planning Requirements for data destruction, if applicable • Legal Requirements – Procedures for long term preservation – Provider requirements and plans to meet them – Institution responsible for long-term costs of data preservation – Institutional requirements and plans to meet them – Succession plans for data should archiving entity go out of existence • Responsibility • Formats – Individual or project team role responsible for data management – Generation and dissemination formats and procedural justification – Qualifications, certifications, and licenses of responsible parties – Storage format and archival justification • Ethics and privacy – Format documentation – Informed consent • Metadata and documentation – Protection of privacy – Internal and External Identifiers and Citations – Data use agreements – Metadata to be provided – Other ethical issues – Metadata standards used • Adherence – Planned documentation and supporting materials – When will adherence to data management plan be checked or – Quality assurance procedures for metadata and documentation demonstrated • Data Organization – Who is responsible for managing data in the project – File organization – Who is responsible for checking adherence to data management plan – Naming conventions – Auditing procedures and framework • Storage, backup, replication, and versioning • Value of information assets – Facilities – Project use value – Methods – Institutional audience and uses – Procedures – Public audience and uses – Frequency – Relation to institutional collection – Replication – Relation to disciplinary evidence base – Version management – Cost of re-creating data – Recovery guarantees • Security – Procedural controls – Technical Controls – Confidentiality concerns – Access control rules Needs for Data Management & Citation 27
  • 28. Approaching Requirement Overlap • Sanity-check DMP details with lifecycle questions: – Who wants it?Planning – What do they need it for? – When will it be used? • Be conscious of elements that serve multiple goals / or lifecycle – Metadata/documentation – Identifiers – Budgets – Formats – IP Rights and confidentiality restrictions – Responsibilities/Adherence • Use tracking tools and methods throughout lifecycle This Way… Needs for Data Management & Citation 28
  • 29. Lifecycle Management: Tracking Needs for Data Management & Citation 29
  • 30. What do we track? What tools and methods provide technical leverage or incentives to management across lifecycle stages and among actors?Tracking • Identification – identifiers, references, citations • Provenance – relationship of delivered data to history of inputs and modifications and actors responsible for these ; revision control; versioning • Authenticity: assertions about the provenance of the records • Respect des fonds: assertions about the original organization of the records • Chain of custody: assertions about the ownership of the records • Integrity: assertions about the management of the records; fixity of bits; fixity of semantics • Auditing: verification of properties & policy compliance Sources: Bulleted list of attributes adapted from Moore 2008 Needs for Data Management & Citation 30
  • 31. Tracking Across Information Lifecycle Long-term Creation/Collecti access on identifiersTracking Storage/I Re-use ngest Metadata for: Integrity, Provenance, citation Custody External dissemination/publicati Processing on Internal Analysis Sharing 31
  • 32. Data Citation: a Point of Leverage • Services – Identifiers to specific fixed versions of data are needed to establish unambiguous chains of provenance – Identifiers that can be globally resolved to machine- understandable metadata and to identified object are needed toTracking building generalized access and analysis services – Persistence of identifiers are needed to maintain long-term access • Incentives – Scholarly credit (intellectual attribution) is a large motivator for many researchers – citation creates incentive for researchers to publish data – Scholars also comply with enforceable journal policies -- requiring data citation is a light-weight method to make data access policies auditable – Impact/usage is a motivator for public research funders – data citation provides foundation for measures of usage and impact Needs for Data Management & Citation 32
  • 33. Emerging Practices for Data Citation • Publishers – OECD iLibrary – Thomson ReutersTracking Data Citation Index • Data archives – Dataverse Network – Data-PASS • Harmonization efforts – DataCite – NAS BRDI – ICSU/Co-Data • Discipline specific Needs for Data Management & Citation 33
  • 34. Identifier and Citation Use Cases Attribution • Provide scholarly attribution • Provide legal attribution • Identify contributors to dataVerification Discovery• Associate work with version • Locate data via identifier of evidence used • Locate data integral to article• Verify fixity of bits • Locate works related to data• Verify fixity of information – articles, derivatives,• Verify “authenticity” of work sources Access Persistence • Access to surrogate • Does evidence persists as long as assertions based on • On-line access to object it? • Machine understandability • Is durability of evidence • Long-term understandability transparent? Needs for Data Management & Citation 34
  • 35. Emerging Principles for Data Citation • Data citations should be first class objects for publication -- appear with citations to other works; should be as easyTracking to cite as other works • Citations should persist and enable access to fixed version of data at least as long as citing work • Citations should persist and enable access to fixed version of data at least as long as the citing work exists. • Citations should support unambiguous attribution of credit to all contributors, possibly through the citation ecosystem. Needs for Data Management & Citation 35
  • 36. FixityTracking • Are files, bitstreams corrupted? • Do semantics remain the same over time, across formats, software analysis systems? Some semantic approaches… Universal Numeric Fingerprint - Canonicalization Perceptual Signatures – Characterization of Significant Properties Needs for Data Management & Citation 36
  • 37. Audit [aw-dit]: An independent evaluation of records and activities toTracking assess a system of controls Fixity mitigates risk only if used for auditing.
  • 38. Example: Functions of Storage Auditing • Detect corruption/deletion of contentTracking • Verify compliance with storage/replication policies • Prompt repair actions
  • 39. Audit Design Choices • Audit regularity and coverage: on-demand (manually); on event; randomized sample; scheduled/comprehensiveTracking • Audit procedure, algorithms, certifying authority • Auditing scope: integrity of object; integrity of collection; integrity of network; policy compliance; public/transparent auditing • Trust model • Threat model
  • 40. Lifecycle Management: Infrastructure Needs for Data Management & Citation 40
  • 41. Many Tools, Few Solutions “Poor carpenters blame their tools” –Proverb “If all you have is a hammer, everything looks like a nail” – Another Proverb “Ultimately, some people need holes – but no one needs a drill. ” – Yet Another ProverbInfrastructure • Many scientific tools are embedded in needs, perspectives, and practices of specific disciplines • Identify common requirements • Identify gaps across lifecycle stages and among actors Needs for Data Management & Citation 41
  • 42. Core Requirements for Data Sharing Infrastructure • Stakeholder incentives – recognition; citation; payment; compliance; servicesInfrastructure • Dissemination – access to metadata; documentation; data • Access control – authentication; authorization; rights management • Provenance – chain of control; verification of metadata, bits, semantic content • Persistence – bits; semantic content; use • Legal protection – rights management; consent; record keeping; auditing • Usability – discovery; deposit; curation; administration; collaboration • Business model Sources: King 2007; ICSU 2004; NSB 2005 Needs for Data Management & Citation 42
  • 43. Mind the Gaps Lifecycle Strengths Other Gaps dissemination collection analysis storage reuseScientific - Close integration across supported - Discipline-centric lifecycle - Doesn’t address most storageWorkflow - Perceived as useful service by requirements (replication, accessSoftware researchers control)(e.g. Taverna) - High PerformanceStorage - Integration across supported lifecycle - Loose integration of analysis, - Storage is perceived as useful service insufficient for reproducibilityGrid/VRE by researchers(e.g. Irods) - High performance performanceInstitutional - Low cost - Access and discovery mechanisms - Institutional commitment to long- usually tailored to publications, notRepository data term access(e.g. Dspace)Reproducible - Close integration of analysis and - Addresses replication but not scientific publication reuse for secondary analysis,Publications integration - Reduces risk of embarrassmentSystems when working with “co-authors”(e.g. StatWeave) - Ensures one form of reproducibility (calibration, mechanical replicability)“Data Archive” - Richer support for reuse - Varied models – curated database; - Often supports cross-discipline “virtual archive”, disciplinary discovery; long-term access repository - Often discipline-centric Needs for Data Management & Citation 43
  • 44. Exemplar Efforts(A.K.A., What have you done for me lately?) Needs for Data Management & Citation 44
  • 45. • Audit Data Replication & Integrity Policies Automatic Auditing of DataExamplars Replication & Integrity Policies safearchive.org Needs for Data Management & Citation 45
  • 46. The Distributed Content Replication Problem • We hold digital assets we A Partial Solution: LOCKSS  Self-contained OSS wish to preserve  Harvests resources via open • Many of these assets are interfaces not replicated  Replicated through secure P2P • Even when replicated, protocol vulnerable to single  Self-repairingExamplars points of failure because  Zero trust replicas are managed by  Used by hundreds of institution single institution for collaborative preservation What we needed… Auditing – how many replicates exist, where & are they current? Policy – prove replication are consistent with policy, like TRAC? Collaboration – coordinateforwith partners to replicate content?46 Needs Data Management & Citation
  • 47. Resilience of peer-to-peer with the Accountability of centralized systemExamplars Facilitating collaborative replication and preservation with cyberinfrastructure … • Collaborators declare explicit non-uniform resource commitments • Policy records and schematizes commitments, desired TRAC replication properties • Storage layer provides replication, integrity, freshness, versioning • SafeArchive software provides monitoring, auditing, transparency, and provisioning • Content is harvested through HTTP (LOCKSS) or OAI-PMH • Integration of LOCKSS, Institutional Repositories, TRAC Needs for Data Management & Citation 47
  • 48. ORCID is an international, interdisciplinary, open, and not-for-profit organization created for the benefit of all stakeholders, including researchExamplars institutions, funding organizations, publishers, and researchers to enhance the scientific discovery process and improve collaboration and the efficiency of research funding. ORCID aims to solve the name ambiguity problem in scholarly communications by creating a registry of persistent unique identifiers for individual researchers and an open and transparent linking mechanism between ORCID, other ID schemes, and research objects such as publications, grants, and patents. http://orcid.org Needs for Data Management & Citation 48
  • 49. ORCID Launch to Public in October ORCID Launch Partners Program include research institutions, publishers, research funders, data repositories, and third party providers, such as: The American Physical Society, Aries Systems, Avedas, Boston University, the California Institute of Technology, CrossRef, Elsevier, Faculty of 1000, figshare, Hindawi Publishing Corporation, KNODE, Nature Publishing Group, SafetyLit, Symplectic, Thomson Reuters, Total-Impact, and Wellcome Trust.Examplars At Launch, the ORCID Registry will: • Allow researchers and scholars to register for an ORCID identifier, create ORCID records, and manage their privacy settings • Contain ORCID records created by universities on behalf of their researchers and scholars • Allow researchers and scholars to link their ORCID record external identifiers, including Scopus and ResearcherID • Facilitate synchronization of ORCID identifier record data with external systems including Scopus • Bi-directionally link to a number of author profile and manuscript submission, including the American Physical Society, Aries Systems, Hindawi Publishing Corporation, Nature Publishing Group, and Scholar One Manuscripts • Allow researchers and scholars to search and upload publication metadata from CrossRef • (Soon after launch) have the ability to link to grant application systems Needs for Data Management & Citation 49
  • 50. Data Management Workflows for Open Access JournalsExamplars + http://bit.ly/DVNOJS Needs for Data Management & Citation 50
  • 51. Embed Real Data Archives in Journals • Embed remotely managed data archive in OJS journal • Replaces “supplemental materials” • Ads – Online analysisExamplars – Independent storage – Persistent identifiers and citation – Data versioning – Enhanced discoverability and interoperability – Format normalization – Fixity and replication Needs for Data Management & Citation 51
  • 52. Integrated Policies, Workflow, Access • OJS and DVN – Support workflows – Enforce policies – Disseminate content • Integrate policies for – Access and data licenseExamplars – Embargoes – Citation • Coordinate – Submission – Review – Publication • Link – Content – Subscriptions & notifications – Usage Metrics Needs for Data Management & Citation 52
  • 53. Wrapping UpNeeds for Data Management & Citation 53
  • 54. How will we see the geography of science e,when we reveal how research connects through data? Research & Node Layout: Kevin Boyack and Dick Klavans (mapofscience.com); Data: Thompson ISI; Graphics & Typography: W. Bradford Paley (didi.com/brad); Commissioned Katy Börner (scimaps.org) Seed Magazine, Mar 7, 2007 http://seedmagazine.com/content/article/scientific_m ethod_relationships_among_scientific_paradigms/ Needs for Data Management & Citation 54
  • 55. Summary• Principled approach to data management – Follow information through information lifecycle – Assess stakeholder requirements – Track management, use, impact across lifecycle• Data management planning goals – Orchestrate data for current use – Protect against disclosure – Compliance with contracts, regulations, law, and policy – Maximize value of information assets – Ensure short term and long term dissemination• Lifecycle data management tracking – Identification – identifiers, references, citations – Provenance – relationship of delivered data to history of inputs and modifications and actors responsible for these – Authenticity: assertions about the provenance of the records – Chain of custody: assertions about the ownership of the records – Integrity: assertions about the management of the records; fixity of bits; fixity of semantics – Auditing: verification of properties & policy compliance• Data citation is a key leverage point – Services: establish provenance; access; long-term preservation – Incentives: scholarly credit; reproducible research policies; impact/usage analysis – Data citations should be first class objects for publication -- appear with citations to other works; should be as easy to cite as other work Needs for Data Management & Citation 55
  • 56. Additional References• Buckheit J, Donoho DL. Wavelab and reproducible research. In: Antoniadis A, editor. Wavelets and Statistics. New York, NY: Springer; 1995. p. 55-81.• International Council For Science (ICSU) 2004. ICSU Report of the CSPR Assessment Panel on Scientific Data and Information. Report.• King, Gary. 2007. "An Introduction to the Dataverse Network as an Infrastructure for Data Sharing." Sociological Methods and Research 36• Moore, M. 2008, Towards a Theory of Digital Preservation, International Journal of Digital Curation 1(3)• National Science Board (NSB), 2005, Long-Lived Digital Data Collections: Enabling Research and Education in the 21rst Century, NSF. (NSB-05-40). Needs for Data Management & Citation 56
  • 57. DiscussionContact information: Web: http://micahaltman.com E-mail: micah_altman@alumni.brown.edu Twitter: @drmaltman