Understanding data services in distributed and collaborative research settings


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Good morning, Erik Mitchell, an Assistant Professor at the University of Maryland, College ParkMy co-presenter, Jeffery Loo of UC Berkeley is unable to be here due to a family emergencyI will do my best to represent his sections, I encourage you to get in touch with him if you have questions about UC Berkeley specific initiatives
  • Focus of talk: How can libraries support CDR and what does this support mean for data servicesPresentation is grounded in three case studies and we are interested in . . .Exploring a casestudy that emphasizes how Jeff, Susan rathbungrubb and I used CDR tools in our researchExamine 2 case studies that describe how libraries can support CDR through tools, instruction and outreachConsider how CDR fits into library services as a whole and what challenges and opportunities existSo, What is collaborative and distributed research? Lets explore this concept by considering our first case studyJeff and I- along with our research collaborator Susan Rathbun-Grubb - engaged in a research project that showcased elements of collaborative and distributed research.The research was focused on understanding the role of the PhD in libraries – who tended to have it, how was it used, and what ‘PHD’ skills are of value to libraries
  • It was collaborative because we were a team of three who workedsimultaneously towards a common goal.While we had team goals, we also had individual goals, strengths and interest in elements of the research
  • It was distributed. Our team is basedacross three different sites in three different locations in Maryland, South Carolina, and California. And the only way we could meet, discuss, and interact was inonline spaces. Therefore, all our research data, tools, resources, and activities needed to be online.
  • Our research is also open. We wanted to collaborate and distribute our work beyond ourselves to other researchers. Being a research team of practicing and former librarians, we strived to share all stages of our work through open access publications and via public repositories whenever possible.
  • Our research had to be agile. As we know, research is a complex and iterative cycle, with lots of trial and error.And research directions can change throughout the process. Because of this complexity and variation, we needed our research workflow and tools to be powerful but also flexible – so that we could quickly adopt a research tool, test it out, and then use or abandon it.
  • So CDR emphasizes collaborative, distributed, open, and agile research technologies and practices.As researchers, these principles guided our work and tool selection But we found that finding the right tools and managing our research output required a concerted effort along every step of our research process
  • While we were a team of 3 doing ‘social science’ research, there are certainly much larger examples of CDRFor example Galazy Zoo is a Citizen Science projectSome statsAnimage data set of 1Million galaxiesIn their first year people contributed over 50 Million classificationsThey published the images online with a system that allowed people to scan and classify galaxiesIn In this model - People contribute in different ways to a public databaseCDR often involves research that can be difficult or impossible with other means, not even super computers.In contrast to Galaxy Zoo, our project was running on a very limited budget – so we sought off-the-shelf tools for our research
  • When thinking about CDR and how to support it it is important to understand the cycle of research:The circular model comes from a series of reports by the Research Information Network, a UK organization that studied how research is supported in University librariesIt includes four phases (idea discovery, funding approval, experimentation, results dissemination)The “pre-publication phase” model comes from a series of articles written in D-Lib Magazine in 2007 by Anna Gold.A Gold asserts, libraries have historically focused on the outputs rather than the process of research.Much of our literature discusses supporting the results dissemination process but little focuses on the other three areasSome literature does describe support in these sareas, for exampleHelp preserve products of the research process including data, research logs, research tools – Project Bamboo is doing a good job of documenting potential tools and there is growing exploration of digital research logs (e.g. open wet ware in biology, Libraries launching new initiatives to support faculty in the creation of data management plansAt least anecdotally, we have found that it is difficult to provide support in theses areas
  • And if the process of research is not complex enough, researchers are facing an intricate and highly competitive environment in which funding, domain expertise, institutional pressure and professional assessment are key factorsThis can make it understandable that researchers have little time to devote to the acquisition of new skills!Librarians can struggle to find a place in this mix: collaborations feature distinct roles: tasking, mentoring, technician, writer, researcher / analyst – These factors can make CDR more complex – competing interests can be magnified across institutions and collaboration, while good for securing funding can complicate research methods and activities
  • We would like to spend a few minutes talking about our own experience with CDR, the issues we had and our approach to handling themIn doing so, we want to touch on tools and considerations in Research design, IRB approval, data gathering, data management, collaborative analysis and publishing/dissemination
  • When we started our research project, the first step was planning our research design and workflow. There was a lot of brainstorming, sharing of research articles, and draftingof research plans, re-writing, and the sharing of these drafts.Thiscreative and discussion-based process needed three important elements: coordination, communication, and creating collaborative documents.Lynne Siemens explores how digital services fit communication needs, observing that Researchers tend to use technologies to reconstruct traditional verbal/written interactions, Verbal, real-time interactions are an essential element and regardless, some things (e.g. thorny issues, team bonding, personal connections) are really difficult to do via digital meansWe all knew each other prior to this collaboration – that was clearly a ‘leg-up’ in enabling collaboration
  • We needed rich communication tools,easy-to-use scheduling tools and simultaneous editing capabilities– to be able to communicate visually by video, audio, and text and to share files easily. It was importantto be able to metaphorically draw a diagram on a paper napkin during our online meetings. And this had to happen in real-time to ensure that the momentum of our brainstorming and discussions was not hampered.We used Skype for all our teleconferences for the audio and video communication and for text messages. We could talk, show diagrams, and send each other strings of text like URLs very easily. Sometimes we had to demonstrate a file or an application on our individual computers, so we used the share desktop functionality. Additionally, we used join.me – a service that lets others see your desktop, and you can share control of your desktop in order to collaboratively use applications together.Finally, we needed a way to create documents collaboratively . To be able to document our work, and then distribute control for any team member to take ownership and make revisions. Again Google Docs and Google Drive were very helpful. We also made use of Google Sites which allowed us to create a website for our research project that was helpful in publicizing our research progress and gathering feedback from colleagues.
  • Our data gatheringprocess was relatively straightforward – we were using a survey and so a web-based platform was easy to selectWe used an online survey platform(Qualtrics), Collaborative built a survey online, Allows us all to view data as it is collectedQualtrics is just one of many available tools but there are also key considerations. For example, many IRB require surveys that use SSL to gather and disseminate data, they need to limit access to researchers, provide backups and be cost effectiveSome providers (e.g. surveymonkey) only provide these tools using paid support levels
  • For data management we used a mix of tools:Qualrics stores our raw research dataDedoose stores our analyzed dataGoogle Drive stores backups of our dataKey features of Data Management include:Ability to support ongoing data management & reportingAbility to track versions and changes to documentsOTHER FEATURES?While Dropbox is a great solution for this as well we tended away from this because Google was a common platformOther researchers use GIT and similar version control management tools
  • While it was relatively easy to find tools to help gather and manage data it was more difficult to find cloud tools to analyze dataMultiple coders – inter-coder reliability – testing, code coordination.. .We used a qualitative analysis method that included definition of codes and applications of these codesThere are several good tools out there but few are truly cloud basedConsidered maxqda – multi-client/data analysis problemTalk about import process – had to write software to extract and re-code data slightlyWe considered a few alternatives: Virtual clients on amazon with a dedicated application, but dedoose worked wellCame out of UCLA – Dr. Eli Leiber and Thomas Weisner – great example of research turning into a wider project
  • We are still working on publishing and are currently normalizing our data for publicationOur goal is to submit to ipcsr, a database of social science research dataRequires depositing data and documentation required to independently read and interpret collection.IPCSR works with researchers to normalize and archive data following submissionWe included this in our IRB protocol and consent – but nature of data (long, reflective statements) makes it difficult
  • We found that even working to conduct research online that it was difficult to maintain version control and access with filesTwo issues stood out:CDR Literacy – how do we find/select the proper tool and then how do we use these tools effectively – previous expertise helped but still took time!Management / support – How do we make sure we have the access level we need & how do we pay for these tools?The only tool we paid for was Dedoose: $150 for 1 year.- We used a grant through UC Berkeley to support this work – shows the potential of micro grants in libraires
  • Before we move on to our next two case studies. . How did / could we have used libraries to support our process?Libraries have so far focused on supporting the end-products of research, the CDR model is an opportunity for libraries to support research activities.We needed:funding, a micro grant from UC Berkeley supported usTool evaluation skills, our own information literacy expertise served us wellGuidance in IRB, data managementLibraries do support researchers in these three ways but clearly outreach is an issue, In our case study, we only turned to libraries for funding!There are many ways in which libraries can contribute if they can get researcher’s attention . . .Coordination: scheduling, documenting ideas, sharing filesCommunication: rich discussion (video, text, audio, application, and file sharing that is (a)synchronous)Collaborative writing: sharing drafts, re-writing online, version controlData collection: gathering and managing digital data, online storage and sharing, metadata assignmentData analysis: collaborative approaches, web/shared applications for analysisSharing: repositories, open access principles, scholarly communications practices
  • With a baseline understanding of CDR and how we think libraries can support itlets turn to two other case studies that adress the question of “literacy” in more detail.In our own experience, our ability to quickly identify and evaluate tools was central in our success, but as we have often observed, this can be a difficult process for researchers without those skills.Our second case study focuse on Jeff’s efforts to reach out to Chemistry graduate students (doctoral / masters)Based on his experience in our research he developed a instructional and liaison approach for supporting researchers in CDR – rather than about deploying technologies and systems at the library.At Berkeley, They are engaging with CDR elements through data management literacy. This approach aligns with their core strengths and builds on existing programs in information literacy,scholarly communications, and open access.
  • The UC program definesdata management literacy in terms of the skills and practices required to create high-quality research data that is safely stored and shareable.Data management deals with CDR issues. Research teams are collecting data collaboratively and they share this data across teams and with researchers worldwide. Data management is also an important issue for our research patrons. Funding agencies like the NSF and NIH are now requiring data management plans or data sharing.So at Berkeley, Jeff designed a curriculum for data management literacy that addressed the workflow of data from collection to publication.
  • Their curriculum covers the stages of data in the research life cycle: planning for data management, saving data, describing data, sharing data, and ethical use. We omitted data analysis – because their patrons are the experts in that.
  • The curriculum expanded on each of these areas, focusing on practice and toolsOur handout provides a link to this curriculum, but here are some highlights. For each data activity they discuss the practices and literacies involved, and then demonstrate the associated technologies, tools and concepts.For the technologies covered, the curriculum focuses on tools that are cloud-based, collaborative, and distributed. And it emphasizes tools provided by networks and consortia – particularly the work of the California Digital Library – so that there is a community of interest and support.This curriculum was based on the instructional programs for data management at MIT and the University of Minnesota. But in the UC Berkeley instructional approach they emphasized quick and practical tasks and tips for the different stages of the research life cycle. The goal was to emphasize that here are some relatively easy and quick things you could begin doing right now to protect and manage your data according to best practices.We turned our curriculum into classes that we gave for a campus research ethics seminar and for student orientations at the beginning of the year. We also created guides, videos, and websites that fostered self-instruction. And finally, we gave a training class to liaison librarians and we framed this training in terms of responding to common questions that our patrons may have about data management.Besides instruction, we wanted to provide support to our researchers through a team of liaison librarians who work together to provide specialized reference consultations and outreach to researchers. We have a listserv email address for patrons to get in touch.
  • In our class, we highlight three data management tools developed by the University of California Curation Center at the California Digital Library.EZID is another service that lets you generate permanent identifiers like DOIs.There is Merritt, an online repository service.DMPTool - an online service for building data management plans with step-by-step instructions for meeting funding agency requirementsEnd of this case study – sorry, I have no high-level analytical thoughts on this one!
  • And now – for our third case study – I teach 670 – inf orgI was inspired to integrate CDR and data service learning into this class,Good fit with content, good opportunity to focus on curriculum integrationJob descriptions now focus on these concepts and students are very early on studying them to figure out what type of job they will focus onUM is completing a curriculum redesign featuring more integration in core classesWhat did I do?Incorporate CDR / DM skills into the curriculumInformation Organization: Utilize categories of tools (data analysis, version control, data sharing), introduce metadata standards and methods for data curationData analysis: work with metadata as a ‘first class object’ using Google RefineInformation Technology: Design data-rich information systems that support a data-intensive activity (viewshare)Virtualization: Build meta-literacies in students by engaging them with cloud-based tools (VCL, ViewShare)
  • So for example, rather than emphasizing MARC or Dublin Core alone, we use these as jumping off points for a wider range of metadata standards and typesWe integrate skills in user needs analysis, metadata standards selection, information system design and data visualization to create a digital library to serve a specific community / data needHow will it work? Too soon to know.The LIS community is increasingly turning towards integrated curricula however, and towards ‘data’ (at least at the moment!)Illinois, simmons, maryland, chapel hill among others launching new or data-specialized programs notably Simmons is offering a digital curation program that weaves content together from among courses
  • So what can we learn from these three case studies?CDR / data services are finding the mainstream in research, in teaching, in traditional library workAs researchers and librarians – we found we needed tool literacy but also naturally turned inward – we only reached out when we needed $$!As LIS educators – The emergence of the ‘data services’ librarian indicates an interesting trend in LISLIS educators can meet by curriculum integration Also found in our research that there is an increasing call for multi-domain and research expertise in librariansSo,should libraries support CDR – and if so, how?There is sufficient evidence that researchers have a need in this area – growth of inter/cross disciplinary work means that there is a need and often room for an ‘information specialist’The suite of tools to support CDR have expanded greatly and are specialized for certain domains / platformsFurthermore,the rise of paradigms like digital humanities, e-science, emerging scholarly communications models of rapid publications that foster online discussion (like PloS One, e-Life) - will only drive the need for collaborative and distributed research efforts.And if libraries support CDR, how should they support them?There is growing literature in this area, discussing. . .- suggest three broad areasBuilding collaborative meeting spacesResearch training and supportHolistic information literacyTeaching and learning supportScholarly communicationsAlternative publishing modelsIn conclusion I would like to consider three broad areas in which library support of CDR and Data Services are valuable
  • First – Pro-active supportI decided to feature some quotes from our research project (case study 1) that addressed this question, at least indirectlyIn our own experience we found thatknowing about services (e.g. the small grant program) helped us build our CDR toolkit – even if we did not turn to the library for tool selectionProactive support was a theme in some of our research findings – participants suggested that librarians could expand outreach to include suggesting potential collaborators and research / grant avenuesThis is a common theme among librarians in this space, using connections and willing collaborators they are building programs ‘one brick at a time’.This can take the form of funding support, of grant ideas, of collaborative suggestionsBy their nature, researchers are inwardly focused, often have difficulty connecting with collaborators from other disciplines
  • Second – Literacy – Jeff found that a multi-pronged approach at UC has been effective in spurring CDR supportBut he also found that graduate students are more receptive.At the same time, current researchers and faculty have immediate needs in this area – finding and implementing the right tools to support CDR data services can be difficultThis is harder in CDR, requires a closer partnershipSome libraries, e.g. New Mexico are doing this through PHD librarian / faculty partnershipsWe found in our instruction cases that reaching out to graduate students was much more effective.Douglas Rushkofftalks about this issue in his book “Program or be Programmed” by observing that our IT literacies are often a generation behind our technologiesFortunately for us, generations in Academia run in 5-7 year cycles!But this view of literacy goes both ways – as librarians we need to become much more adept at research methods
  • And finally, infastructure support.Jeff is leveraging institution-wide tools from CDL but there are also smaller-scale tools that are usefulThe cloud is both making it easier for libraries to support customized CDR environments and more difficult to keep up with themThe cloud changes the economic equation for CDR – Most small/mid-sized projects could be supported with very little funding OCLC and Research Information Network commented on fact that faculty resist institutional ‘one size fits all’ support – libraries can fit a more situation-specific needTwo examples:Project bamboo has a good database of data and research tools (some web-based others not) – Challenge is in connecting tools with CDR activitiesMyNetResearch seeks to connect collaborators with funding and literatureThe Cloud is making it possible to deploy on-demand cloud resourcesLabslice – non persistent virtual environmentsLibcc.org – a persistent python-based framework for virtual labsThis fall I am piloting a virtual computer lab that gives my students a persistent desktop environment
  • To wrap up . . .Providing support for CDR is not unlike our efforts in institutional repositories, data management and other research support servicesIn our case study exploration we found a need to support a wide range of activities and servicesThere is a big problem however in that researchers do not turn to libraries. A 2010 research project on virtual research environments by OCLC that discuss this - found that faculty are very inwardly focused and are driven by funding and publication forcesReaching out to graduate students is important for this reason, as is the ongoing efforts of libraries to connect with researchers as ‘equal partners’CDR can be new for many people, and they may need support in these areas. When the library enters this arena we have an opportunity to add value through our unique perspectives on information literacy and scholarly communications.Efforts at UC Berkeley show a productive collaboration between CDL and Liaison work, LIS instruction is testing a method for developing CDR expertise as a byproduct of core LIS instructionNew approaches to supporting short-term infrastructureAs we have seen at this conference, Libraries are very active in exploring how to build these programs.
  • Thoughts? Questions?
  • Understanding data services in distributed and collaborative research settings

    1. 1. Understanding data services in collaborative and distributed research settings Erik Mitchell Jeffery Loo
    2. 2. Purpose of our talk Essential elements of CDR Implications for libraries and LIS education Approaches to CDR support http://tinyurl.com/cdr-lita12
    3. 3. Collaborative and Distributed ResearchSimultaneousteam andindividual work
    4. 4. Collaborative and Distributed Research Online spaces and tools to bring geographically diverse collaborators together
    5. 5. Collaborative and Distributed ResearchDisseminationvia open accessand publicrepositories
    6. 6. Collaborative and Distributed Research Powerful, collaborative, distributed, and easy to adopt, build, or abandon research tools
    7. 7. Collaborative and Distributed Research Online spaces andSimultaneous tools to bringteam and geographically diverseindividual work collaborators togetherDissemination Powerful, collaborative,via open access distributed, and easy toand public adopt, build, orrepositories abandon research tools
    8. 8. CDR in action http://blog.galaxyzoo.org/
    9. 9. Research cycleshttp://www.soas.ac.uk/careers/earlycareerresearchers/file69090.pdfhttp://www.dlib.org/dlib/september07/gold/09gold-pt1.html
    10. 10. Research contextMacColl, John and Michael Jubb. 2011. Supporting Research: Environments, Administration and Libraries. http://www.oclc.org/research/publications/library/ 2011/2011-10r.html
    11. 11. A case of CDR • Research design [Coordination / Communication] • IRB approval / Data gathering • Data management • Collaborative analysis • Publishing / dissemination
    12. 12. Research design Coordination Communication Creating Brainstorming collaborative Sharing documentsDrafting and re-writing
    13. 13. Coordination / communication / documentsvideo, audio, and text-based communication remote screen sharing software share control of your machine Simultaneous editing Scheduling
    14. 14. IRB approval / Data gathering Web-based survey platform Project information IRB approval for open dataIRB approval for distributed research
    15. 15. Data management / analysis Data management via survey platform Shared code dictionariesVersion control Document backup / syncing
    16. 16. Collaborative analysis
    17. 17. Publishing / dissemination • Domain specific • Data / metadata – Data file – Documentation – Description • Confidentiality /anonymity
    18. 18. Challenges in CDR Research design [Coordination / Communication] Publishing / IRB Approval /dissemination Data gathering Collaborative Data analysis management
    19. 19. Implications for library support Supporting research activities Many CDR technologies and literacies to support Coordination Communication Collaborative writing•scheduling, documenting •rich discussion •sharing drafts, re-writing online, ideas, sharing files •video, text, audio, application, and file version control sharing •(a)synchronous Data collection Data analysis Sharing • managing digital • collaborative • repositories, open data, online storage and approaches, web/shared access sharing, metadata applications for analysis principles, scholarly communications
    20. 20. Instructional and liaison approachSupport CDR elements in – information literacy – scholarly communications – open access Data management literacy
    21. 21. Data management literacy Skills/practices for Safely High- stored Shareable quality data data dataMotivated by funding agency requirements
    22. 22. Data planningCurriculum Saving data Describing data Sharing data Ethical use
    23. 23. Activity Practices and literacies Technologies, tools, concepts DMPTool (an online service for building dataPlanning for data management plans with step-by-step instructions formanagement Preparing data management plans meeting funding agency requirements) Storage options: external hard drives, departmental servers, cloud storage, institutional and public Evaluating storage needs and providers archives/repositories Safe and secure data storage and back Online storage, backup services and practices, up encryption tools File saving for long-term access File formatsSaving data Long term storage / archiving Merritt (the UC online repository service) Metadata assignment (preparation forDescribing data re-use) Metadata standards Motivations and benefits of data Funder, journal, and institutional policies on data sharing sharing; research evidence of data sharing benefits Self-archiving websites/blogs, repositories, open Data sharing approaches access, ethical and legal requirements Internal data sharing (among research Online data sharing services (Dropbox, Google Drive, collaborators) etc.) Data publishing Identifying public repositories http://databib.org/ and http://thedatahub.org/ EZID (a service for generating permanent identifiers - Assigning permanent identifiers DOIs and ARKs) Creative Commons CC0 Declaration, Public DomainSharing data Data licensing Dedication Data citations Citation stylesEthical use Following data licenses Intellectual property and licensing
    24. 24. Tools fromUC3/California Digital Library for generating permanent identifiers online repository service
    25. 25. LIS educationPreparing LIS students to support CDR / data managementservices Develop metadata Emphasize CDR Data research Build IT literacy skills projects literacy• Expand range of • Teach using online • Designing and • Introduce metadata skills / tools implementing a visualization standards • Introduce data-rich digital methods /• Connect metadata cloud, distributed library technologies and data analysis platforms • Encourage • Data-centric user abilities distributed needs analysis collaboration
    26. 26. Data literacy in Information Organization Communities of usershttp://www.dlib.indiana.edu/~jenlrile/metadatamap/seeingstandards.pdf Visualization techniques
    27. 27. Why should libraries support CDR? Information CDR Data literacy literacy / research in LIS liaison workThree studies, three themes on CDR
    28. 28. Pro-active support“The most interesting way I have seenthis being promoted lately is throughtools . . .that assist in bringing togetheressential information thoughkeywords, resume and grant informationso others can search for persons with likeor needed interests and skills”
    29. 29. Literacies“I found that working in an academiclibrary required me to develop a lot ofcompetencies and interests that Id neverbeen encouraged to develop whileworking on my Ph.D., but which wouldprobably have made me a better Ph.D.student if Id learned them earlier”
    30. 30. Infrastructure Customized cloud environmentshttp://labslice.com, http://libcc.org Data application exchanges http://dirt.projectbamboo.org
    31. 31. Supporting CDR Essential elements of CDR Implications for libraries and LIS education Approaches to CDR support http://tinyurl.com/cdr-lita12
    32. 32. Jeffery LooCheminfomatics LibrarianUniversity of California, Berkeleyjeff@jeffloo.comErik MitchellAssistant ProfessorCollege of Information StudiesUniversity of Marylanderik@umd.edu