• Share
  • Email
  • Embed
  • Like
  • Private Content


Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Communicating with Data 2010 Annual Meeting






Total Views
Views on SlideShare
Embed Views



1 Embed 1

http://www.crossref.org 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.


11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Video recording: http://river-valley.tv/communicating-data-new-roles-for-researchers-publishers-and-libraries/.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Communicating with Data 2010 Annual Meeting Communicating with Data 2010 Annual Meeting Presentation Transcript

    • Communicating with Data: New Roles for Researchers, Publishers and Libraries
      MacKenzie Smith
      Associate Director for Technology, MIT Libraries
      Science Commons Research Fellow, Creative Commons
      CrossRef Annual Meeting ©2010, MIT
    • 2
    • 3
    • The world’s first hard drive (5Mb)
      IBM Almaden Research Center, 1952-1954
      That Was Then
      CrossRef Annual Meeting ©2010, MIT
    • Current capacity hard drive (>2Tb)
      Google Data Center, 2010
      This is Now
      CrossRef Annual Meeting ©2010, MIT
    • How Much Information?
      “IDC research shows that the digital universe —information that is either created, captured, or replicated in digital form — was 281 exabytes in 2007. In 2011, the amount of digital information produced in the year should equal nearly 1,800 exabytes, or 10 times that produced in 2006. The compound annual growth rate between now and 2011 is expected to be almost 60%”
      The Diverse and Exploding Digital Universe, 2008 IDC White Paper
      CrossRef Annual Meeting ©2010, MIT
    • How Much Information?
      Sequence Submissions to DNA DataBank of Japan 1993-2005
      CrossRef Annual Meeting ©2010, MIT
    • What Is Research Data?
      Observationale.g. sensor, telemetry, survey, sample data
      Experimentale.g. genetic sequences, chromatograms
      Simulation e.g. climate, economic, 3-D models
      Media e.g. images, audio, video
      Derived/compilede.g. text/data mining, compiled databases
      Often expensive or impossible to reproduce
      CrossRef Annual Meeting ©2010, MIT
    • What Is Research Data?
      Text e.g. flat text files, Word, PDF
      Numericale.g. SPSS, STATA, Excel, MySQL
      Media e.g. jpeg, tiff, dicom, mpeg, quicktime
      Modelse.g. 3D, statistical
      Softwaree.g. Java, C programs
      Domain-specifice.g. FITS in astronomy, CIF in chemistry
      Instrument-specifice.g. Olympus con-focal microscope
      Not always in neat packages like books
      CrossRef Annual Meeting ©2010, MIT
    • What Do Researchers Do With Data?
      Analyze (e.g. process, visualize)
      Review (evaluate methods)
      Re-use (reproduce results)
      Re-purpose (e.g. integrate)
      CrossRef Annual Meeting ©2010, MIT
    • Data Sharing Innovations
      New-fangled Hybrid Articles
      Integrate text, data and tools
      Enhanced PDFs
      Linked Open Data
      Access to data via Web standards to encourage large-scale interoperability
      “Data Papers”
      CrossRef Annual Meeting ©2010, MIT
    • Issues in Data Curation
      Storage very large scale
      Metadata what standard to use?
      Provenance research methods
      Identifiers scalability, persistence
      Preservation see slide #5 on formats
      Sharing laws confusing, not interoperable
      CrossRef Annual Meeting ©2010, MIT
    • Data Sharing Trends
      “The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers.” NIH grant proposal guide
      Similar data management, sharing mandates from US NSF, other funding agencies worldwide
      Journals mandating deposit
      (e.g. Journal of Evolutionary Biology)
      CrossRef Annual Meeting ©2010, MIT
    • Data Interoperability
      IPR and data licenses
      Lots of data not copyrightable since facts cannot be copyrighted
      UK, EU, some other countries have sui generis data rights
      Laws not “interoperable”
      Big problem for international scientific collaborations and data re-purposing
      CrossRef Annual Meeting ©2010, MIT
    • BWIN presentation ©2010, MIT
    • Libraries and Data
      Established curation for some data types
      statistical (Harvard-MIT Data Center)
      geospatial (Geodata Repository)
      bioinformatics (via NLMNCBI)
      digital media (e.g. images, videos)
      datasets (IR digital archives)
      CrossRef Annual Meeting ©2010, MIT
    • BWIN presentation ©2010, MIT
    • Libraries and Data
      Applies to both faculty-authored and externally-acquired data
      Consultation services (in-person, via Website)
      Liaise with data archives (e.g. ICPSR)
      Develop (meta)data standards (e.g. DDI)
      Manage and preserve data
      CrossRef Annual Meeting ©2010, MIT
    • BWIN presentation ©2010, MIT
    • Robotics Data in DSpace@MIT
      The Library:
      Defined local taxonomy for metadata values
      Customized metadata records
      Adapted/simplified deposit workflow
      Loaded data from previous repository
      Added CC0 licenses
      Review of new deposits done by community
      CrossRef Annual Meeting ©2010, MIT
    • New roles for scholarly data Communication
      CrossRef Annual Meeting ©2010, MIT
    • 22
    • Researcher’s Role: Data Provision
      e.g. Sage Commons
      “The Sage Commons is a novel information platform being built by an international partnership of researchers and stakeholders to define the molecular basis of disease and guide the development of effective human therapeutics and diagnostics.
      The Sage Commons will be used to integrate diverse molecular mega-data sets, to build predictive bionetworks and to offer advanced tools proven to provide unique new insights into human disease biology.  Users will also be contributors that advance the knowledge base and tools through their cumulative participation.
      The public access mission of the Sage Commons requires the development of a new strategic and legal framework to protect the rights of contributors while providing widespread access to integrative genomics resources.”
      CrossRef Annual Meeting ©2010, MIT
    • Library’s Role: Data Curation
      Data organization and annotation
      e.g. ontologies and metadata
      Data archiving, preservation
      e.g. perpetual access
      Outreach and support to local researchers
      CrossRef Annual Meeting ©2010, MIT
    • Publisher’s Role: Data Accreditation
      Require data deposit to archives
      Publish data journals
      Manage peer review (quality control)
      Provide credit for data publishing
      (evolution of promotion & tenure system)
      CrossRef Annual Meeting ©2010, MIT
    • Data Papers Revisited
      “a formal publication whose primary purpose is to expose and describe data, as opposed to analyze and draw conclusions from it.”
      Organize peer-review, establish quality-control measures
      Create citable entity
      Establish cross-linking mechanisms with traditional papers, to enforce separation of concerns (methodology vs analysis)
      Specify required documentation to make data re-usable, re-purposable
      Apply standard interoperable legal license (CC0or PDDL with normative attribution, CC-By with URI attribution)
      Ensure archiving strategy in place
      Jonathan Rees, Recommendations for independent scholarly publication of data sets, Creative Commons Working Paper, March 2010,
      CrossRef Annual Meeting ©2010, MIT
    • Questions?
      CrossRef Annual Meeting ©2010, MIT