Your SlideShare is downloading. ×
Supporting the Research Data Life Cycle at CDL
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Supporting the Research Data Life Cycle at CDL

810
views

Published on

Presentation on Day 1 of Data Management Workshop at the University of Florida, July 2012

Presentation on Day 1 of Data Management Workshop at the University of Florida, July 2012

Published in: Education, Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
810
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Image credits (left to right)http://www.flickr.com/photos/bioproduct/367245894/, by ingserbanhttp://content.cdlib.org/ark:/13030/kt8d5nf0sg/?order=1, Courtesy of Loyola Marymount University, Department of Archives and Special Collections, William H. Hannon Libraryhttp://content.cdlib.org/ark:/13030/kt400024ww/?order=3, Courtesy of Mission Viejo Libraryhttp://content.cdlib.org/ark:/13030/kt167nd02p/?order=1, Courtesy of the Riverside Public Libraryhttp://www.flickr.com/photos/lyght55/3649248315/, by By Evan Wohrmanhttp://content.cdlib.org/ark:/13030/kt2199r37d/?order=1, Courtesy of Loyola Marymount University, Department of Archives and Special Collections, William H. Hannon Libraryhttp://content.cdlib.org/ark:/13030/kt767nf42x/?order=1, Courtesy of the Agua Caliente Cultural Museumhttp://content.cdlib.org/ark:/13030/kt9v19q5hg/?order=1, Courtesy of the Orange Public Libraryhttp://content.cdlib.org/ark:/13030/kt7h4nf50b/?order=1, Courtesy of the Agua Caliente Cultural Museum
  • IMAGE CREDIT: http://www.flickr.com/photos/bleuman/6160605143/By Yunchung LeeAs Sherry described to you, Research has a life cycle.
  • Supporting the MANAGE AND SHARE STAGES ARE THE MERRITT curation and preservation repository and our long-term identifier service, EZID.BAKING DATA CURATION INTO THE COLLECTION PHASE: DATA-UPAND ENHANCING COLLECTION OF E-SCIENCE IN THE WEB ENVIRONMENT : WEB ARCHIVING SERVICE, OR WASTo facilitate data publication, we are exploring this new Data Paper model.We are engaged in a number of network-level collaborations and partnerships, but these two have particular relevance to the data management space, with DataONE focused on distributed data networks and DataCite on persistent identifiers.And lastly, we have partnered with UVA, and many others to develop and launch an easy to use Data Management Plan Tool.Let’s take a brief look at all of these things, and then we’ll talk about what this means to you.
  • Current work includes the Datashare project at UC San Francisco (UCSF). Datashare, as the name suggests, encourages researchers to share their data. See the Datashare website at http://datashare.ucsf.edu
  • In its capacity as a data Management tool, Merritt can function in one of several ways: it can be a “dark” or inaccessible archive for important digital assets; It can serve as a “bright” archive with direct discovery and access; It can be the preservation back-end for existing or new discovery and content management systems; or it can integrate with distributed data grids. Current work includes the Datashare project at UC San Francisco (UCSF). Datashare, as the name suggests, encourages researchers to share their data. See the Datashare website at http://datashare.ucsf.edu
  • Preservation: Curationmicroservices and MerrittTo bake data curationinto data creation: DCXL (Data Curation XL Plug-In)To enhance data sharing, collecting and gathering: WASserviceTo facilitate data publication, we are exploring this new Data Paper model.And behind many of these steps, the EZID service.We are engaged in a number of network-level collaborations and partnerships, but these two have particular relevance to the data management space, with DataONE focused on distributed data networks and DataCite on persistent identifiers.And lastly, we have partnered with UVA, and many others to develop and launch an easy to use Data Management Plan Tool.So let’s take a brief look at all of these things, and while I’m there, I’ll dive more deeply into EZID, which is the service I manage.
  • Preservation: Curationmicroservices and MerrittTo bake data curationinto data creation: DCXL (Data Curation XL Plug-In)To enhance data sharing, collecting and gathering: WASserviceTo facilitate data publication, we are exploring this new Data Paper model.And behind many of these steps, the EZID service.We are engaged in a number of network-level collaborations and partnerships, but these two have particular relevance to the data management space, with DataONE focused on distributed data networks and DataCite on persistent identifiers.And lastly, we have partnered with UVA, and many others to develop and launch an easy to use Data Management Plan Tool.So let’s take a brief look at all of these things, and while I’m there, I’ll dive more deeply into EZID, which is the service I manage.
  • Nobody thinks of Excel as a preservation-ready tool, but everybody uses it! The KEY IDEA in keeping this EASY here is: let them use the tools they are use to using. (Get out of the way of that elephant!)Gordon & Betty Moore Foundation + Microsoft Research are funding this.Our part is requirements gathering; MS will do development. Open source plug in.
  • WAS allows curators to collect and manage web-published content so that scholars can use the content for private research and/or publish the content for public access. The archives contain eScience content as well as government documents, event captures, and archives for specific research communities, such as unique data sets, collections of sites not otherwise grouped together, and the sites resulting from grant activity.
  • PUBLIC OR PRIVATEWAS provides tools for analyzing site change over time and allows keyword searching for archived sites, and publishing an archive is optional. As of this writing, there are 93 active archives, over half of which are publically available.
  • Preservation: Curationmicroservices and MerrittTo bake data curationinto data creation: DCXL (Data Curation XL Plug-In)To enhance data sharing, collecting and gathering: WASserviceTo facilitate data publication, we are exploring this new Data Paper model.And behind many of these steps, the EZID service.We are engaged in a number of network-level collaborations and partnerships, but these two have particular relevance to the data management space, with DataONE focused on distributed data networks and DataCite on persistent identifiers.And lastly, we have partnered with UVA, and many others to develop and launch an easy to use Data Management Plan Tool.So let’s take a brief look at all of these things, and while I’m there, I’ll dive more deeply into EZID, which is the service I manage.
  • We are exploring this idea with various partners and funders, potentially to encourage conventions for describing data so that it can stand alone when appropriate
  • Preservation: Curationmicroservices and MerrittTo bake data curationinto data creation: DCXL (Data Curation XL Plug-In)To enhance data sharing, collecting and gathering: WASserviceTo facilitate data publication, we are exploring this new Data Paper model.And behind many of these steps, the EZID service.We are engaged in a number of network-level collaborations and partnerships, but these two have particular relevance to the data management space, with DataONE focused on distributed data networks and DataCite on persistent identifiers.And lastly, we have partnered with UVA, and many others to develop and launch an easy to use Data Management Plan Tool.So let’s take a brief look at all of these things, and while I’m there, I’ll dive more deeply into EZID, which is the service I manage.
  • 3 CLICKSDataONE is an NSF funded, virtual data center for biology, ecology, and environmental sciences.DataOne has the overarching goal of building a new culture of data access and data sharing. This is an international collaboration working with scientists and librarians, as well as other stakeholders.Engaging the scientist in the data curation processSupporting the full data life cycleEncouraging data stewardship and sharingPromoting best practicesEngaging citizensDeveloping domain agnostic solutions
  • Carly has already talked to you about DataONE but I wanted to give you a little picture of the scope and depth of this work.
  • How can EZID be in the business of issuing DataCite DOIs? California Digital Library was one of the founding members.DataCite was indeed formed in 2009 by 10 Libraries and Research Centers with a Mission: “"Helping you find, access, and reuse data“The number has now grown to 16. In addition there are 3 associate members, including the Korea Institute of Science and Technology Information and BGI, so there is a presence in Asia.DATACITE’s primary methodology for achieving this mission: issuing DOIs (Digital Object Identifiers) for datasets.
  • Preservation: Curationmicroservices and MerrittTo bake data curationinto data creation: DCXL (Data Curation XL Plug-In)To enhance data sharing, collecting and gathering: WASserviceTo facilitate data publication, we are exploring this new Data Paper model.And behind many of these steps, the EZID service.We are engaged in a number of network-level collaborations and partnerships, but these two have particular relevance to the data management space, with DataONE focused on distributed data networks and DataCite on persistent identifiers.And lastly, we have partnered with UVA, and many others to develop and launch an easy to use Data Management Plan Tool.So let’s take a brief look at all of these things, and while I’m there, I’ll dive more deeply into EZID, which is the service I manage.
  • These are the factors driving the collaboration: Institutions rely on soft funding… agencies have created a new demand, meet the demand or don’t get funded.Approach is to work collaboratively to consolidate expertise and reduce costsLibraries plus, plusProvide an environment that allows researchers to focus on research
  • Image credit:http://content.cdlib.org/ark:/13030/kt667nc4xn/?query=service%20station&brand=calisphere, courtesy of Anaheim Public Library
  • Image credit: http://content.cdlib.org/ark:/28722/bk0007s853c/?query=tools&brand=calisphere Courtesy of UC Berkeley, Bancroft Library; United Aircraft Corporation: Joint War Production Drive CommitteeDATA CURATION LEADS TO GOOD OUTCOMES FOR RESEARCHERS.They’ll be motivated routinely to deposit in stable public storage. Data products (datasets and processing information) and the data papers that reward them with authorship creditData journals will spring up around disciplines, even if disciplinary data papers are scattered across geographically distributed repositories.Data products will be re-used, annotated, corrected, and precisely linked to from traditional publications.Data products will enter the scientific record instead of being lost
  • While you are thinking about these matters, you can take advantage of a growing community of practice around curation issues, curation services, and TOOLS also. We’ll talk some more about this on Day 3.Ring leaders are our UC3 partner at UC San Diego: Declan Fleming and our good friend Mike Giarlo at Penn State.Virtual options include: listserv, Google group, Twitter, Facebook, Chat, and wiki. Visit the website for details.
  • Transcript

    • 1. Supporting the Research Data Life Cycle at CDL S u m m e r, 2 0 1 2 Joan Starr University of Florida Data Management Workshop
    • 2. Research has a life cycle. PLANPUBLISH COLLECT SHARE MANAGE
    • 3. Data Management Life Cycle SupportTOOLS & SERVICES• To enable data PLAN preservation• To bake data curation PUBLISH into data creation COLLECT• To enhance data sharing, collecting and gathering• To facilitate data publication SHARE MANAGEPARTNERSHIPS• To promote data discovery and access• To help researchers comply with new requirements
    • 4. Examples from CDL & UC3TOOLS & SERVICES• Merritt• EZID } MANAGE, SHARE• DataUp• WAS } COLLECT• Data Paper model PUBLISHPARTNERSHIPS• DataONE & DataCite COLLECT, MANAGE, SHARE• Data Management Plan Tool PLAN
    • 5. • Curation repository open to the UC community and beyond• Discipline / content agnostic• Micro-services architecture• Easy-to-use UI or API• Hosted or locally deployed Primary Functions 1. Deposit 2. Manage (metadata, versions, etc) 3. Access (expose) 4. Share (with other researchers) 5. Preserve
    • 6. https://merritt.cdlib.org/• Dark archive for important digital assets• Bright archive with direct discovery and access• Preservation back-end for existing or new discovery and content management systems and services• Integration with distributed data grids
    • 7. Examples from CDL & UC3TOOLS & SERVICES Merritt• EZID } MANAGE, SHARE• DataUp• WAS } COLLECT• Data Paper model PUBLISHPARTNERSHIPS• DataONE & DataCite COLLECT, MANAGE, SHARE• Data Management Plan Tool PLAN
    • 8. EZID: long-term identifiers made easy http://n2t.net/ezid take control of the management and distribution of your research, share and getcredit for it, and build your reputation through its collection and documentation Primary Functions 1. Create long-term identifiers 2. Manage identifiers over time 3. Manage associated metadata over time
    • 9. What this means…
    • 10. What this means…
    • 11. Examples from CDL & UC3TOOLS & SERVICES Merritt EZID } MANAGE, SHARE• DataUp• WAS } COLLECT• Data Paper model PUBLISHPARTNERSHIPS• DataONE & DataCite COLLECT, MANAGE, SHARE• Data Management Plan Tool PLAN
    • 12. Data Curation for Excel• Excel is the database of choice for many researchers• How to encourage data sharing, archiving, and publishing? – Self-description – Enhance discovery – Facilitate the determination of suitability for usePrimary Functions Surveys indicate: • Most researchers are unaware of1. Metadata description (through preservation options extraction and augmentation) • Documentation practices are poor2. Check export compatibility • Excel is just one tool in workflows3. Transfer to repository
    • 13. Web Archiving Service (WAS) Capture today’s web, build tomorrow’s archive http://webarchives.cdlib.org/Primary Functions1. Collect web published content2. Manage content3. Use content for private research4. Publish content for public access
    • 14. WAS: a range of uses and users• archives for research communities• events• web content for private study and analysis• organizations web presence
    • 15. Examples from CDL & UC3TOOLS & SERVICES Merritt EZID } MANAGE, SHARE DataUp WAS } COLLECT• Data Paper model PUBLISHPARTNERSHIPS• DataONE & DataCite COLLECT, MANAGE, SHARE• Data Management Plan Tool PLAN
    • 16. Data Publication Vision: “data paper”• Wrap the unfamiliar in a familiar façade• Minimally, a cover sheet and a set of links to archived artifacts• Cover sheet contains familiar elements: title, date, authors, abstract, identifiers• Just enough metadata to permit basic exposure to and discovery – Indexing by services such as Web of Science, Google Scholar – Instilling confidence in the identifier’s stability
    • 17. Examples from CDL & UC3TOOLS & SERVICES Merritt EZID } MANAGE, SHARE DataUp WAS } COLLECT Data Paper model PUBLISHPARTNERSHIPS• DataONE & DataCite COLLECT, MANAGE, SHARE• Data Management Plan Tool PLAN
    • 18. Working at the Network Level enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it1. Build on existingcyberinfrastructure 2. Create new cyberinfrastructure 3. Create new communities of practice
    • 19. DataONE’s new infrastructure https://www.dataone.org/
    • 20. http://datacite.org/
    • 21. Examples from CDL & UC3TOOLS & SERVICES Merritt EZID } MANAGE, SHARE DataUp WAS } COLLECT Data Paper model PUBLISHPARTNERSHIPS DataONE & DataCite COLLECT, MANAGE, SHARE• Data Management Plan Tool PLAN
    • 22. DMPTool https://dmp.cdlib.org/ Meeting funding agencies data management plan requirements Coalition partners • CDL • DataONE • Digital Curation Centre • Smithsonian Institution • UCLA Library • UCSD Libraries • University of Illinois • University of Virginia LibrariesPrimary Functions1. Step-by-step “wizard”2. Templates and examples3. Links to institutional resources and agency information4. Plan publication and sharing
    • 23. What can this mean for you?• Open source – DataUp SERVICES! – Data Management Plan tool• Off the shelf – Merritt – EZID – WAS
    • 24. & what can it mean to researchers?• For organizing their data – DataUp , EZID• To keep their data safe TOOLS! – Merritt• To help them get grants – Data Management Plan tool• To help get their work noticed – EZID, Data Papers• To help them find other data – EZID, Data Papers
    • 25. But wait, there’s more: Community!• CURATECamp: unconference events connecting practitioners & technologists interested in digital curation and data management.• For f2f events: http://curatecamp.org/• http://groups.google.com/group/digital-curation courtesy of Oxnard Public Library, http://content.cdlib.org/ark:/13030/kt6c600758
    • 26. and more information! http://www.cdlib.org/uc3UC Curation Centeruc3@ucop.eduUC3/CDLStephen Abrams David LoyPatricia Cruse Mark ReyesScott Fisher Abhishek SalveErik Hetzner Tracy SenecaGreg Janée Joan StarrJohn Kunze Marisa Strong Perry Willett
    • 27. …and here’s how to find me. Joan Starr joan.starr@ucop.edu @joan_starrhttp://www.slideshare.net/joanstarr