Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Opening up data – Jisc and CNI conference 10 July 2014

4,397 views

Published on

MacKenzie Smith, university librarian, University of California, Davis

  • Be the first to comment

Opening up data – Jisc and CNI conference 10 July 2014

  1. 1. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 1 MacKenzie Smith University Librarian University of California, Davis
  2. 2. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 2
  3. 3. At Creative Commons, we believe scientific data should be freely available to everyone. We call this idea Open Data. Creative Commons legal tools can be used to make data and databases freely available. We’ve already had successful implementations in taxonomic, energy, genomics, disease research, geospatial, polar, and bibliometric disciplines, and are providing guidance to funders, institutions, private foundations, governments, the corporate sector, and other stakeholders. Read more about Creative Commons and data. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 3
  4. 4. NIH (2003): “The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers.” (>$500,000, include data sharing plan) NSF grant guidelines: “NSF ... expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work. It also encourages grantees to share software and inventions or otherwise act to make the innovations they embody widely useful and usable.” (2005 and earlier) NSF peer-reviewed Data Management Plan (DMP), January 2011 July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 4
  5. 5. 3/13/2014 ©UC Regents, 2014 5
  6. 6. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 6
  7. 7. 2011 2012 Required as condition of publication, barring exceptions Required but may not affect editorial decisions Encouraged/addressed, may be reviewed and/or hosted Implied No mention 10.6% 11.2% 1.7% 5.9% 20.6% 17.6% 0% 2.9% 67.1% 62.4% 3/13/2014 ©UC Regents, 2014 7 Source: Stodden, Guo, Ma (2013) PLoS ONE, 8(6)
  8. 8. 2011 2012 Required as condition of publication, barring exceptions Required but may not affect editorial decisions Encouraged/addressed, may be reviewed and/or hosted Implied No mention 3.5% 3.5% 3.5% 3.5% 10% 12.4% 0% 1.8% 82.9% 78.8% 3/13/2014 ©UC Regents, 2014 8 Source: Stodden, Guo, Ma (2013) PLoS ONE, 8(6)
  9. 9. JASA June • 1996 • 2006 • 2009 • 2011 Computational Articles Code Publicly Available 9 of 20 0% 33 of 35 9% 32 of 32 16% 29 of 29 21% 3/13/2014 ©UC Regents, 2014 9
  10. 10. Executive Memorandum directing federal funding agencies to develop plans for public access to data and publications (Feb 2013) “data is defined... as the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications...” Executive Order directing federal agencies to make their own data publicly available (May 9) July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 10
  11. 11. • Consists of digital assets • Datasets, papers, software, lab notes • Each asset is uniquely identified and has provenance, including access control • e.g., publishing simply involves changing the access control • Digital assets are interoperable across the enterprise July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 11
  12. 12. Code Data 77% Time to document and clean up 54% 52% Dealing with questions from users 34% 44% Not receiving attribution 42% 40% Possibility of patents - 34% Legal Barriers (e.g. copyright) 41% - Time to verify release with admin 38% 30% Potential loss of future publications 35% 30% Competitors may get an advantage 33%July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 12 Survey of the Machine Learning Community, NIPS (Stodden 2010)
  13. 13. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 13 Pass National Data Breach Legislation that provides for a single national data breach standard, along the lines of the Administration's 2011 Cybersecurity legislative proposal.
  14. 14. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 14
  15. 15. • Infrastructure • Developing new tools across the research life cycle • Mostly individual institutions or disciplines • National initiatives emerging (e.g. ARL/AAU/APLU SHARE initiative) • Policy • Institutional Open Access policies • SHARE copyright group • Training • ARL e-science institute • ARL spec kit on RDM activities • Current events July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 15
  16. 16. Dissemination Platforms, e.g. DataONE DataVerse RunMyCode.org Workflow Tracking and Research Environments, e.g. VisTrails Kepler Taverna Embedded Publishing, e.g. Sweave Knitr VCR (Verifiable Computational Research) 3/13/2014 ©UC Regents, 2014 16
  17. 17. • Disciplinary • ICPSR, Genbank • Dryad, ONEShare • Sage Commons (Sage Bionetworks) • Displinary/Institutional • DataVerse, Nesstar • Institutional • IRs galore: e.g., UC’s Dash and Chronopolis, Purdue’s PURR, JHU’s Data Conservancy, Stanford Digital Repository, many local DSpace/Fedora/Hydra/Islandora instances, Locally run and cloud hosted, locally run and cloud hosted • Data Centers on every campus • Generic/cloud • Figshare • DuraCloud July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 17
  18. 18. We continued to refine the infrastructure for linking between articles and data. The web service for returning the corresponding Dryad data DOI when queried with an article DOI is now being used by Elsevier to provide a link to the data from ScienceDirect for 40 different Elsevier journals that have at least one data package in Dryad. Dryad is an international collaborator in the EU-funded ORCID DataCite interoperability Network Project (odinproject.eu), which this past year introduced a tool enabling researchers to add research outputs with DataCite DOIs (such as Dryad data packages) to their ORCID profiles. We also introduced regular updating of linkages between related records in PubMed, Genbank, and EuropePMC to data packages in Dryad. To further promote discoverability and accessibility, Dryad officially became a DataONE Tier 1 member node. Improvements to the curation interface have led to an increase in curation efficiency of greater than 25% in the past year. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 18 Dryad Annual Report, 2013
  19. 19. Embargo selections of Dryad data authors for the 10,108 files in Dryad deposited by September 20, 2013. Data include only datasets related to articles published in journals for which the authors had the option of selecting an embargo. (B) Longer term embargoes (>1 year) by journal that granted them. Data Archiving: Suggestions to Increase Participation. PLoS Biol12(1): e1001779 doi:10.1371/journal.pbio.10017796 July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 19
  20. 20. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 20
  21. 21. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 21
  22. 22. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 22
  23. 23. COMING NIH data catalog (part of the BD2K initiative) SHARE registry HERE NOW Thomson Reuters Data Citation Index OCLC WorldShare (includes OAIster) Google/Google Scholar July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 23
  24. 24. • DOIs for Data (DataCite, CrossRef, EZID) • ORCIDs for Researchers • FundRef for funding agencies • Still missing good institutional identifiers July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 24
  25. 25. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 25
  26. 26. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 26
  27. 27. • An IP rights strategy, including the promotion of university-based Open Access policies and favorable licensing terms, will be part of the scaffolding that will enable the layers of SHARE to develop. • Rights subgroup formed to deal with this • A broad collective action by AAU and APLU – to be discussed with AAU Presidents in April
  28. 28. 40 22 38 42 23 33 48 47 0 10 20 30 40 50 60 Data archiving by library Data sharing & access support Data citation support Research metadata support Other Data Mangement… DMP training DMP consulting Online DMP resources Data manageme nt planning Data manageme nt support Data sharing & archiving Key finding: RDM Service Offering ARL SPEC Kit 334: Research Data Management Services (July 2013) http://publications.arl.org/Research-Data-Management-Services-SPEC-Kit-334/ July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 28
  29. 29. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 29 0 10 20 30 40 50 60 DMP training DMP consulting 89% N = 48 61% N = 33 ARL SPEC Kit 334: Research Data Management Services (July 2013) http://publications.arl.org/Research-Data-Management-Services-SPEC-Kit-334/
  30. 30. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 30
  31. 31. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 31
  32. 32. An ACRL e-Learning Online Course, July 14-August 1, 2014 Description: Demand for data management plan consultants is growing as more granting agencies add this requirement. Most presentations concerning data management do not provide practical advice on how to consult with researchers writing a data management plan for grant submission. This course teaches participants about the elements of a successful data management plan, and provides practice critiquing data management plans in a supportive learning environment where no grant funding is at stake. Join two experienced data management plan consultants with experience in liaison librarianship and information technology as they demonstrate how all librarians have the ability to successfully consult on data management plan. Each week will include assigned readings, a written lecture, discussion questions, weekly assignments, and live chats with the instructors. Participants will examine how data and metadata are defined, open data formats, dark archives, and secure repositories as well as addressing specialty concerns such as how securely preserve information related to at risk populations, etc. Selection of effective long term data preservation and sharing strategies will also be examined. Lastly, participants will evaluate sample data management plans from the sciences, social sciences, and the arts and humanities as a final project for the course. Critiques of each plan will be presented to the class during the final chat session at the end of the course. Learning Outcomes: List specific data depository resources in order to formulate recommendations for researchers to securely deposit and share their data. Learn about how different funding agencies, and departments within those agencies, have different requirements for data management plans in order to determine how to effectively advise each researcher according to the requirements for their specific plan. Analyze sample data management plans in order to develop an understanding of what constitutes a thorough data management plan. Presenters: Dee Ann Allison, Professor, University of Nebraska-Lincoln; Kiyomi Deards, Assistant Professor, University of Nebraska- Lincoln Course Requirements: Your participation will require approximately three to five hours per week of primarily asynchronous activities to: Read the online seminar material Post to online discussion boards Synchronous chat sessions (optional) Complete online exercises Complete a seminar evaluation form July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 32
  33. 33. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 33
  34. 34. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 34
  35. 35. “CLIR Postdoctoral Fellows work on projects that forge and strengthen connections among library collections, educational technologies, and current research. The program offers recent PhD graduates the chance to help develop research tools, resources, and services while exploring new career opportunities. Host institutions benefit from fellows' field-specific expertise by gaining insights into their collections' potential uses and users, scholarly information behaviors, and current teaching and learning practices within particular disciplines.” • >110 fellows so far • UC Davis postdoc in neuroscience: Jonathan Cachat July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 35
  36. 36. “Painstakingly detailed surveys have been performed across several research organizations, particularly in North America (CLIR; ARL; CDL), Europe (DCC; RIN; NESTA) and Australia (ANDS). The same overall picture emerges: • Research data is found in a dizzying number of file formats (some proprietary) • Research data can be digital or non-digital • Lack of metadata & documentation • Data storage is desperate, unorganized, unsecured and researchers need more space • Researchers welcome help with federal funding mandates (Data Management Plans) • PIs are not concerned with data sharing preparation – a time consuming, thankless job in the current publish-or-perish merit system” July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 36
  37. 37. “There is ample evidence of a need for research data management services as provided by reports published from libraries and organizations (cited above). However, it is critical to recognize that sloppy record keeping and the constant, fast-paced strive for bigger, faster, stronger technological infrastructure are inherent to the scientific enterprise. Any services that sterilize or mandate rigid process control may provide solutions to specific data concerns, but will do so at a detriment to science – not an ideal solution” Amari, Beltrame, Bjaalie, & Dalkara, 2002; Gardner et al., 2003; Kubilius, 2014; Landreth & Silva, 2013; Wallis et al., 2013; White, Baldridge, Brym, Locey, & McGlinn, 2013. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 37
  38. 38. “Mandated changes that are detrimental to the flow rate of a daily research enterprise will not be successful. This challenges the core of research data management, curation and service efforts. It highlights the fact that sometimes efforts to help an external group (e.g., neuroscientists) with internal expertise (e.g., library skill sets), even with the best intentions and solid rational can be unhelpful and unsustainable.” The problem we are trying to solve is advancing the environmental support and training provided by the university to researchers and students in order to fulfill its mission. Researchers and students are aware of the growing popularity and potential of big data, open data, interdisciplinary data. They desire opportunities, skills and support. Advancing the environmental support will improve their research, it will improve their education – it gives them an edge, and for that a university is recognized.” July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 38
  39. 39. • Less emphasis on infrastructure • More emphasis on policy • Citation practices in different research disciplines for data, software • Legal tools for data and software sharing in different contexts • Lots more emphasis on training and culture change • Not of librarians, but researchers themselves July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 39
  40. 40. July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 40

×