Your SlideShare is downloading. ×
0
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Communicating with Data 2010 Annual Meeting
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Communicating with Data 2010 Annual Meeting

1,531

Published on

Published in: Technology, Education
1 Comment
1 Like
Statistics
Notes
  • Video recording: http://river-valley.tv/communicating-data-new-roles-for-researchers-publishers-and-libraries/.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,531
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MacKenzie Smith Associate Director for Technology, MIT Libraries Science Commons Research Fellow, Creative Commons CrossRef Annual Meeting ©2010, MIT
  • 2. 2
  • 3. 3
  • 4. The world’s first hard drive (5Mb) IBM Almaden Research Center, 1952-1954 That Was Then 4 CrossRef Annual Meeting ©2010, MIT
  • 5. Current capacity hard drive (>2Tb) Google Data Center, 2010 This is Now 5 CrossRef Annual Meeting ©2010, MIT
  • 6. How Much Information? “IDC research shows that the digital universe —information that is either created, captured, or replicated in digital form — was 281 exabytes in 2007. In 2011, the amount of digital information produced in the year should equal nearly 1,800 exabytes, or 10 times that produced in 2006. The compound annual growth rate between now and 2011 is expected to be almost 60%” The Diverse and Exploding Digital Universe, 2008 IDC White Paper 6 CrossRef Annual Meeting ©2010, MIT
  • 7. How Much Information? 7 Sequence Submissions to DNA DataBank of Japan 1993-2005 CrossRef Annual Meeting ©2010, MIT
  • 8. What Is Research Data? Observational e.g. sensor, telemetry, survey, sample data Experimental e.g. genetic sequences, chromatograms Simulation e.g. climate, economic, 3-D models Media e.g. images, audio, video Derived/compiled e.g. text/data mining, compiled databases Often expensive or impossible to reproduce 8 CrossRef Annual Meeting ©2010, MIT
  • 9. What Is Research Data? Text e.g. flat text files, Word, PDF Numerical e.g. SPSS, STATA, Excel, MySQL Media e.g. jpeg, tiff, dicom, mpeg, quicktime Models e.g. 3D, statistical Software e.g. Java, C programs Domain-specific e.g. FITS in astronomy, CIF in chemistry Instrument-specific e.g. Olympus con-focal microscope Not always in neat packages like books 9 CrossRef Annual Meeting ©2010, MIT
  • 10. What Do Researchers Do With Data?  Analyze (e.g. process, visualize)  Share  Review (evaluate methods)  Annotate  Cite  Re-use (reproduce results)  Re-purpose (e.g. integrate) CrossRef Annual Meeting ©2010, MIT
  • 11. Data Sharing Innovations  New-fangled Hybrid Articles  Integrate text, data and tools  Enhanced PDFs  Linked Open Data  Access to data via Web standards to encourage large-scale interoperability  “Data Papers” CrossRef Annual Meeting ©2010, MIT
  • 12. Issues in Data Curation  Storage very large scale  Metadata what standard to use?  Provenance research methods  Identifiers scalability, persistence  Preservation see slide #5 on formats  Sharing laws confusing, not interoperable CrossRef Annual Meeting ©2010, MIT
  • 13. Data Sharing Trends “The NIH expects and supports the timely release and sharing of final research data from NIH- supported studies for use by other researchers.” NIH grant proposal guide Similar data management, sharing mandates from US NSF, other funding agencies worldwide Journals mandating deposit (e.g. Journal of Evolutionary Biology) 13 CrossRef Annual Meeting ©2010, MIT
  • 14. Data Interoperability IPR and data licenses  Lots of data not copyrightable since facts cannot be copyrighted  UK, EU, some other countries have sui generis data rights  Laws not “interoperable” Big problem for international scientific collaborations and data re-purposing 14 CrossRef Annual Meeting ©2010, MIT
  • 15. BWIN presentation ©2010, MIT 15
  • 16. Libraries and Data Established curation for some data types statistical (Harvard-MIT Data Center) geospatial (Geodata Repository) bioinformatics (via NLM NCBI) digital media (e.g. images, videos) datasets (IR digital archives) 16 CrossRef Annual Meeting ©2010, MIT
  • 17. BWIN presentation ©2010, MIT 17
  • 18. Libraries and Data Applies to both faculty-authored and externally-acquired data  Consultation services (in-person, via Website)  Liaise with data archives (e.g. ICPSR)  Develop (meta)data standards (e.g. DDI)  Manage and preserve data 18 CrossRef Annual Meeting ©2010, MIT
  • 19. BWIN presentation ©2010, MIT 19
  • 20. Robotics Data in DSpace@MIT The Library:  Defined local taxonomy for metadata values  Customized metadata records  Adapted/simplified deposit workflow  Loaded data from previous repository  Added CC0 licenses Review of new deposits done by community 20 CrossRef Annual Meeting ©2010, MIT
  • 21. CrossRef Annual Meeting ©2010, MIT
  • 22. 22
  • 23. Researcher’s Role: Data Provision e.g. Sage Commons “The Sage Commons is a novel information platform being built by an international partnership of researchers and stakeholders to define the molecular basis of disease and guide the development of effective human therapeutics and diagnostics. The Sage Commons will be used to integrate diverse molecular mega-data sets, to build predictive bionetworks and to offer advanced tools proven to provide unique new insights into human disease biology. Users will also be contributors that advance the knowledge base and tools through their cumulative participation. The public access mission of the Sage Commons requires the development of a new strategic and legal framework to protect the rights of contributors while providing widespread access to integrative genomics resources.” 23 CrossRef Annual Meeting ©2010, MIT
  • 24. Library’s Role: Data Curation  Data organization and annotation e.g. ontologies and metadata  Data archiving, preservation e.g. perpetual access Outreach and support to local researchers 24 CrossRef Annual Meeting ©2010, MIT
  • 25. Publisher’s Role: Data Accreditation  Require data deposit to archives  Publish data journals  Manage peer review (quality control)  Provide credit for data publishing (evolution of promotion & tenure system) 25 CrossRef Annual Meeting ©2010, MIT
  • 26. Data Papers Revisited “a formal publication whose primary purpose is to expose and describe data, as opposed to analyze and draw conclusions from it.” 1. Organize peer-review, establish quality-control measures 2. Create citable entity 3. Establish cross-linking mechanisms with traditional papers, to enforce separation of concerns (methodology vs analysis) 4. Specify required documentation to make data re-usable, re-purposable 5. Apply standard interoperable legal license (CC0 or PDDL with normative attribution, CC-By with URI attribution) 6. Ensure archiving strategy in place Jonathan Rees, Recommendations for independent scholarly publication of data sets, Creative Commons Working Paper, March 2010, http://neurocommons.org/report/data-publication.pdf CrossRef Annual Meeting ©2010, MIT
  • 27. Questions? CrossRef Annual Meeting ©2010, MIT

×