The document discusses trailblazing in research data management. It defines key terms like data, data management, and big data. It outlines why various stakeholders like funding agencies, universities, researchers, and libraries are venturing into research data management. It reviews assessments of data management needs conducted at various universities, examples of existing research data management programs, and available tools and resources. Finally, it discusses how institutions can blaze their own trail in research data management by identifying needs, partners, priorities, and potential services and policies to develop.
This presentation was provided by Muhammad Javed of Cornell University during the NISO virtual conference, Research Information Systems: The Connections Enabling Collaboration, held on August 16, 2017.
This presentation was provided by Jan Fransen of the University of Minnesota - Twin Cities during the NISO virtual conference, Research Information Systems: The Connections Enabling Collaboration, held on August 16, 2017.
Your digital humanities are in my library! No, your library is in my digital ...Rebekah Cummings
A presentation on the intersection of libraries and digital humanities presented at the Utah Digital Humanities Symposium at Utah Valley University on February 26, 2016.
This presentation was provided by Muhammad Javed of Cornell University during the NISO virtual conference, Research Information Systems: The Connections Enabling Collaboration, held on August 16, 2017.
This presentation was provided by Jan Fransen of the University of Minnesota - Twin Cities during the NISO virtual conference, Research Information Systems: The Connections Enabling Collaboration, held on August 16, 2017.
Your digital humanities are in my library! No, your library is in my digital ...Rebekah Cummings
A presentation on the intersection of libraries and digital humanities presented at the Utah Digital Humanities Symposium at Utah Valley University on February 26, 2016.
RDAP14: Building a data management and curation program on a shoestring budgetASIS&T
Research Data Access and Preservation Summit, 2014
San Diego, CA
Margaret Henderson
Director, Research Data Management
Virginia Commonwealth University
This presentation was provided by Julie Goldman of Harvard University, during part two of the NISO two-part webinar "Building Data Science Skills: Strategic Support for the Work, Part Two," which was held on March 18, 2020.
This presentation was provided by Lisa Johnston, University of Minnesota, for a NISO Virtual Conference on data curation held on Wednesday, August 31, 2016
Scratchpads: the Virtual Research Environment for biodiversity dataVince Smith
Rycroft, S., Roberts, D., Smith, V., Heaton, A., Bouton, K., Livermore, L., Koureas, D., Baker, E. 2013. Scratchpads: the Virtual Research Environment for biodiversity data. TDWG, Biodiversity Information Standards. Grand Hotel Mediterraneo Florence, Italy, 27 Oct - 1 Nov., 2013.
This talk was provided by Brian Lowe of Ontocale SRL during the NISO Virtual Conference, Using Open Source in Your Institution, held on February 17, 2016
The presentation was provided by Angie Oehrli of the University of Michigan during the NISO Two-Part Webinar, Digital and Data Literacy, held on September 20, 2017
DataQ project update from the 2015 GWLA/GPN Annual Meeting on May 29th, 2015, Kansas City, MO.
DataQ is a collaborative platform and community aimed at addressing research data questions in academic libraries. This project was made possible in part by the Institute of Museum and Library Services Sparks! Ignition Grant for Libraries SP-02-14-0020-14.
Metadata enriching and filtering for enhanced collection discoverability Getaneh Alemu
The return on investment for academic libraries is chiefly tied to access, usage and impact. Without accurate, consistent and quality metadata on the one hand, and an easy-to-use and effective discovery service on the other, these valuable resources may remain invisible and inaccessible to users. In this talk, Getaneh aims to present four overarching metadata principles, namely: metadata enriching, linking, openness and filtering. And how these ideas help shape the metadata creation and discovery services at Solent University – focusing on the implementation of RDA and FRBR as well as the use of subject headings and authority controls.
This presentation was jointly given by Kevin Read and Alisa Surkis of New York University during the two-part NISO webinar, Digital and Data Literacy, held on September 20, 2017.
This presentation was provided by Courtney R. Butler of The Federal Reserve Bank - Kansas City, during part two of the NISO two-part webinar "Building Data Science Skills: Strategic Support for the Work, Part Two," which was held on March 18, 2020.
Transforming liaison roles for academic librarians is critical, as universities are moving to position themselves to meet the demands of a more competitive national research environment. At La Trobe University, librarians are repackaging current research support services to streamline and incorporate these more efficiently into the researcher’s life cycle, in order to support the University’s research initiatives
About the Webinar
Big data is being collected at a rate that is surpassing traditional analytical methods due to the constantly expanding ways in which data can be created and mined. Faculty in all disciplines are increasingly creating and/or incorporating big data into their research and institutions are creating repositories and other tools to manage it all. There are many challenge to effectively manage and curate this data—challenges that are both similar and different to managing document archives. Libraries can and are assuming a key role in making this information more useful, visible, and accessible, such as creating taxonomies, designing metadata schemes, and systematizing retrieval methods.
Our panelists will talk about their experience with big data curation, best practices for research data management, and the tools used by libraries as they take on this evolving role.
This presentation was provided by Carolyn Hansen of the University of Cincinnati during the NISO Training Thursday event, Metadata and the IR, held on Thursday, February 23, 2017.
RDAP 15 Local ICPSR Data Curation Workshop Pilot ProjectASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Linda Detterman, Jennifer Doty, Jared Lyle, Amy Pienta, Lizzy Rolando and Mandy Swygart-Hobaugh
At Utah State University, a pilot project is under development to evaluate the benefits of tracking data sets and faculty publications using the online catalog and the Library’s institutional repository.
With federal mandates to make publications and data open, universities look for solutions to track compliance. At Utah State University, the Sponsored Programs Office follows up with researchers to determine where data has been or will be deposited, per the terms of their grant.
Interested in making this publicly discoverable, the Library, Sponsored Programs, and Research Office are working together to pilot a project that enables the creation of publicly accessible MARC and Dublin Core records for data deposited by USU faculty. This project aims to make data sets, as well as publications, visible in research portals such as WorldCat, as well through Google searches.
This presentation will describe the project and anticipated benefits, as well as outline the roles of the cataloging staff and data librarian, and the involvement of the Research Office.
RDAP14: Building a data management and curation program on a shoestring budgetASIS&T
Research Data Access and Preservation Summit, 2014
San Diego, CA
Margaret Henderson
Director, Research Data Management
Virginia Commonwealth University
This presentation was provided by Julie Goldman of Harvard University, during part two of the NISO two-part webinar "Building Data Science Skills: Strategic Support for the Work, Part Two," which was held on March 18, 2020.
This presentation was provided by Lisa Johnston, University of Minnesota, for a NISO Virtual Conference on data curation held on Wednesday, August 31, 2016
Scratchpads: the Virtual Research Environment for biodiversity dataVince Smith
Rycroft, S., Roberts, D., Smith, V., Heaton, A., Bouton, K., Livermore, L., Koureas, D., Baker, E. 2013. Scratchpads: the Virtual Research Environment for biodiversity data. TDWG, Biodiversity Information Standards. Grand Hotel Mediterraneo Florence, Italy, 27 Oct - 1 Nov., 2013.
This talk was provided by Brian Lowe of Ontocale SRL during the NISO Virtual Conference, Using Open Source in Your Institution, held on February 17, 2016
The presentation was provided by Angie Oehrli of the University of Michigan during the NISO Two-Part Webinar, Digital and Data Literacy, held on September 20, 2017
DataQ project update from the 2015 GWLA/GPN Annual Meeting on May 29th, 2015, Kansas City, MO.
DataQ is a collaborative platform and community aimed at addressing research data questions in academic libraries. This project was made possible in part by the Institute of Museum and Library Services Sparks! Ignition Grant for Libraries SP-02-14-0020-14.
Metadata enriching and filtering for enhanced collection discoverability Getaneh Alemu
The return on investment for academic libraries is chiefly tied to access, usage and impact. Without accurate, consistent and quality metadata on the one hand, and an easy-to-use and effective discovery service on the other, these valuable resources may remain invisible and inaccessible to users. In this talk, Getaneh aims to present four overarching metadata principles, namely: metadata enriching, linking, openness and filtering. And how these ideas help shape the metadata creation and discovery services at Solent University – focusing on the implementation of RDA and FRBR as well as the use of subject headings and authority controls.
This presentation was jointly given by Kevin Read and Alisa Surkis of New York University during the two-part NISO webinar, Digital and Data Literacy, held on September 20, 2017.
This presentation was provided by Courtney R. Butler of The Federal Reserve Bank - Kansas City, during part two of the NISO two-part webinar "Building Data Science Skills: Strategic Support for the Work, Part Two," which was held on March 18, 2020.
Transforming liaison roles for academic librarians is critical, as universities are moving to position themselves to meet the demands of a more competitive national research environment. At La Trobe University, librarians are repackaging current research support services to streamline and incorporate these more efficiently into the researcher’s life cycle, in order to support the University’s research initiatives
About the Webinar
Big data is being collected at a rate that is surpassing traditional analytical methods due to the constantly expanding ways in which data can be created and mined. Faculty in all disciplines are increasingly creating and/or incorporating big data into their research and institutions are creating repositories and other tools to manage it all. There are many challenge to effectively manage and curate this data—challenges that are both similar and different to managing document archives. Libraries can and are assuming a key role in making this information more useful, visible, and accessible, such as creating taxonomies, designing metadata schemes, and systematizing retrieval methods.
Our panelists will talk about their experience with big data curation, best practices for research data management, and the tools used by libraries as they take on this evolving role.
This presentation was provided by Carolyn Hansen of the University of Cincinnati during the NISO Training Thursday event, Metadata and the IR, held on Thursday, February 23, 2017.
RDAP 15 Local ICPSR Data Curation Workshop Pilot ProjectASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Linda Detterman, Jennifer Doty, Jared Lyle, Amy Pienta, Lizzy Rolando and Mandy Swygart-Hobaugh
At Utah State University, a pilot project is under development to evaluate the benefits of tracking data sets and faculty publications using the online catalog and the Library’s institutional repository.
With federal mandates to make publications and data open, universities look for solutions to track compliance. At Utah State University, the Sponsored Programs Office follows up with researchers to determine where data has been or will be deposited, per the terms of their grant.
Interested in making this publicly discoverable, the Library, Sponsored Programs, and Research Office are working together to pilot a project that enables the creation of publicly accessible MARC and Dublin Core records for data deposited by USU faculty. This project aims to make data sets, as well as publications, visible in research portals such as WorldCat, as well through Google searches.
This presentation will describe the project and anticipated benefits, as well as outline the roles of the cataloging staff and data librarian, and the involvement of the Research Office.
My portion of the panel presentation "Reconfiguring and Transforming Our Spaces". Panel presentation at 15th Conference of Atmospheric Science Librarians International, New Orleans, Louisiana, January 24, 2012.
Presentation to UW librarians staff in 2012 about recently formed Data Services program. Discuss accomplishments in the first year and plans for the future. Co-presented with Matthew Parsons and Theo Gerontakos.
Presentation to faculty at the University of Washington's Applied Physics Lab (2014). Specific tools for addressing best practices outlined in "10 Simple Rules for the Care and Feeding of Scientific Data" (Goodman et al).
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Stephanie Wright
Data curation has emerged as a strategic growth area for academic libraries. Many libraries have conducted needs assessments as a precursor towards developing services; however there have been few comparisons of the findings across institutions. This panel brings together four librarians from different institutions to discuss both common and distinct findings from their respective needs assessments. The panelists will speculate on the application of these findings at their specific libraries and in academic libraries generally.
Presentation for ITCE Spatial Camp at University of Utah. Discuss importance of research data management, challenges, some best practices. Includes slide of tools and resources for learning more. Some slides and information borrowed from Carly Strasser presentation on the same topic, as well as curriculum from Reproducible Science Workshop presented at iDigBio 2015.
Research Data Management in Academic Libraries: Meeting the ChallengeSpencer Keralis
TLA Program Committee sponsored Preconference talk from Texas Library Association Conference 2013.
CPE#388: SBEC 1.0; TSLAC 1.0
April 24, 2013; 4:00 -4:50 pm
Managing research data is a hot topic in academic libraries. With increased government oversight of publicly-funded research projects, librarians must strive to meet the demand for innovative solutions for managing research information and training the new eneration of librarians to address this issue.
Steven McEachern - ADA, DDI (metadata standard) and the Data LifecycleSteve Androulakis
Dr. McEachern is Director of the Australian Data Archive at the Australian National University, and has research interests in data management and archiving, community and social attitude surveys, new data collection methods, and reproducible research methods.
This talk was given for the Monthly Tech Talks event hosted by Australian data infrastructure groups ANDS, NeCTAR, RDS and others.
Slides | Research data literacy and the libraryColleen DeLory
Slides from the Dec. 8, 2016 Library Connect webinar "Research data literacy and the library" with Sarah Wright, Christian Lauersen and Anita de Waard. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=226043
Slides | Research data literacy and the libraryLibrary_Connect
Slides from the Dec. 8, 2016 Library Connect webinar "Research data literacy and the library" with Christian Lauersen, Sarah J. Wright and Anita de Waard. See the full webinar at: http://libraryconnect.elsevier.com/library-connect-webinars?commid=226043
Michigan State University campus policy, resources and best practices for research data management offered by the MSU Libraries Research Data Management Guidance service. http://www.lib.msu.edu/rdmg/
This presentation was provided by Andrew K. Pace of OCLC, during the 13th Annual NISO-BISG forum "Interoperability: From Silos to An Ecosystem," held on June 24, 2020.
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
RDAP 15: Research Data Integration in the Purdue LibrariesASIS&T
Research Data Access and Preservation Summit, 2015
Minneapolis, MN
April 22-23, 2015
Lisa Zilinski, Data Specialist, Carnegie Mellon University
Amy Barton, Metadata Specialist, Purdue
Tao Zhang, Digital User Experience Specialist, Purdue
Line Pouchard, Computational Science Information Specialist, Purdue
Pete E. Pascuzzi, Molecular Biosciences Information Specialist, Purdue
Data Con LA 2019 - Data Science Education. Building Knowledge Graphs by Jose-...Data Con LA
To realize the potential of big data in biomedicine, the National Institutes of Health (NIH) launched the Big Data to Knowledge (BD2K) initiative, funding several centers of excellence in biomedical data analysis and a Training Coordinating Center (TCC) tasked with facilitating online and inperson training of biomedical researchers in data science. A major initiative of the BD2K TCC is the construction of the Educational Resource Discovery Index (ERuDIte), which contains about 12,000 resources including open online courses, video tutorials, scientific lectures, and other training materials, freely available at BigDataU.org. We leverage data science techniques to build ERuDIte itself, using data extraction, data integration, machine learning, information retrieval, and natural language processing methods to automatically collect, integrate, describe, and organize existing online resources for learning data science. The metadata for the resources is described uniformly using Schema.org. We map references to people and organizations in learning resources to entities in DBpedia, DBLP, and ORCID, embedding our collection in the web of linked data. We hope that ERuDIte will provide a framework to foster open linked educational resources on the Web and help democratize data science through data science training.
Managing provenance in the Social Sciences: the Data Documentation Initiative...ARDC
Slides from webinar: Provenance and social science data. Presented on 15 March 2017. Presenter was Dr Steve McEachern, Director Australian Data Archive
FULL webinar recording: https://youtu.be/elPcKqWoOPg
1. Dr Steve McEachern (Director, Aust Data Archive) Data Documentation Initiative (DDI: http://www.ddialliance.org/): A free, international standard for describing data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. It can document and manage different stages in the research data lifecycle, eg conceptualization, collection, processing, distribution, discovery, and archiving. Documenting data with DDI facilitates understanding, interpretation, and use -- by people, software systems, and computer networks.
A review of ICPSR's 50 year history as a research data archive and an overview of the data services it currently offers as well as data services in development
Similar to Trailblazing in the Wilderness of Data Management (20)
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
1. Trailblazing in the Wilderness of
Data Management
Where are we going and how do we get
there from here.
Stephanie Wright
Data Services Coordinator
University of Washington Libraries
2. Click to edit Master title style
AGENDA
• Definitions
• Why venture out
• Paths already taken
–Assessments of needs
–Existing programs
–Tools & resources
• Blazing your own trail
Montana State University – 21 June 2013
3. Definitions
• Data
• Data Management
• Big Data
• Long Tail of Data
• Acronyms
www.lib.washington.edu
4. Definitions
www.lib.washington.edu
DATA
By data, we do not mean a synonym for information. We
mean research data, that which is collected, observed,
or created, for purposes of analyzing to produce
original research results.
Research data may be created in tabular, textual,
statistical, numeric, geospatial, image, multimedia
or other formats.
(Adapted from DISC-UK DataShare Project, p. 16)
5. Definitions
www.lib.washington.edu
DATA
Data can be produced from a variety of processes
(e.g., observation, experimentation, simulation,
derivation, compilation), represented in numerous
forms and stored in many digital formats (e.g.,
ASCII, PDF, SPSS, Excel, TIFF, Java, FITS, CIF, ZVI)
The scope of this definition includes data from
disciplines in the sciences, social sciences, and
humanities.
(Adapted from MIT Libraries, “What is Data?”, 2009)
6. Definitions
www.lib.washington.edu
DATA MANAGEMENT
Pertains to the collection, cleaning, storage, sharing,
access, disposal, preservation and/or archiving of
research data.
(Adapted from University of North Carolina, Research Data Stewardship
Report, 2012)
13. Researchers
www.lib.washington.edu
• Verifiability & reproducibility
• Increased citation rates for publications
– (Piwowar et al, 2007)
• Preservation of individual scholarly record
• Save time by planning early
14. Libraries
www.lib.washington.edu
•Digital Preservation Network (DPN)
“The Digital Preservation Network is being
created by research-intensive universities to
ensure long-term preservation of the complete
digital scholarly record.”
http://d-p-n.org/
15. Libraries
www.lib.washington.edu
NSF Proposal & Award Policies &
Procedures Guide (Oct 2012)
“Instructions for preparation of the
Biographical Sketch have been revised to
rename the "Publications" section to
"Products" ....
(P)roducts may include, but are not limited
to, publications, data sets, software,
patents, and copyrights.”
16. Paths Already Taken
• Assessments
• Existing programs
• Tools & Resources
www.lib.washington.edu
Image credit: John W. Ridge
(http://commons.wikimedia.org/wiki/File:Yellowstone_Trail_Map.jpg)
17. Assessments
www.lib.washington.edu
• UNC (2012) “Research Data Stewardship
Report”
• University of Colorado Boulder (2012)
“Research Data Management @ UCB”
• Purdue “Data Curation Profiles Directory”
(http://docs.lib.purdue.edu/dcp/)
• More: Georgia Tech, Cornell, Houston,
Oregon….
18. Findings
www.lib.washington.edu
• Researchers use a wide variety of data
types – across disciplines
• Most researchers rely on themselves for
data management
• Researchers want to maintain control of
their data
• Many are unaware of existing services
• They want tools that work in existing
workflows
20. Existing Programs
www.lib.washington.edu
• Cornell
– Research Data Management Service Group
• Sr VP for Research and University Librarian
• Faculty Advisory Board
– 9 faculty across disciplines
– OSP & Office of Research Integrity & Assurance
• Management Council
– 2 librarians, 2 faculty, 2 IT, 1 research institute
21. Existing Programs
www.lib.washington.edu
• Purdue
– D2C2: Distributed Data Curation Center
• Executive Committee
– Dean of Libraries, VP of Research & VP of IT
• Library: consulting & metadata support
• IT: storage & research computing support
22. Existing Programs
www.lib.washington.edu
• University of Washington
– Data Services Program (1.5 FTE)
• Data Services Coordinator
• Data Services Communications & Curriculum Libn
– Data Services Team (10 members)
– Partnerships
• Research Centers (eSci, CSDE, IHME)
• Office of Research (OSP)
• Campus IT
• iSchool
24. Blazing Your Own Trail
www.lib.washington.edu
Image credit: Michigan State University Department of History,
HST 321: History of the American West
(http://history.msu.edu/hst321/files/2010/07/colter.jpg)
25. www.lib.washington.edu
• Identify needs
• Consider potential partners
• Scope
– Disciplines
– Specific areas of the data lifecycle
• Determine priorities
– New services? Enhance existing? Market
existing?
Where do you want to go?
26. www.lib.washington.edu
• Objective L1
– Assess and improve where needed, student
learning of critical knowledge & skills
• Objective D1
– Elevate the research excellence and
recognition of MSU faculty
• D1.2
• Objective D2
– Enhance infrastructure in support of research,
discovery and creative activities
MSU Strategic Plan
27. www.lib.washington.edu
• Support for active data storage
• Data security guidance
• Backup services
• Development of tools that can be
inserted into existing workflows
Campus IT
28. www.lib.washington.edu
• Guidance on legal / ethical
considerations
• Incorporate DM planning into
grant submission process
• New faculty data management
orientations
Office of Research
29. www.lib.washington.edu
• Market and provide access to
existing RDM resources
• Provide learning opportunities on
RDM best practices
• DMP consultation
• Storage (final)
• Metadata consultation
Libraries
32. Stephanie Wright
Data Services Coordinator
swright@uw.edu
@shefw
http://guides.lib.washington.edu/swright
Data Management Guide
http://guides.lib.washington.edu/dmg
ResearchWorks Data Services
http://researchworks.lib.washington.edu/rw-data.html
Editor's Notes
Here is where I admit that perhaps my use of the terms trailblazing and wilderness of data mgmt might have been colored by the fact that y’all are so close to Yellowstone which has been one of my favorite places to visit since I was a child. But I defend my use of those words and hope to convince you over the next hour or so that I wasn’t really venturing too far into the realm of hyperbole when I came up with that title.
Here is my map for this little journey. And here I want to take a moment to let you know that we have arranged for Q&A time at the end of my presentation portion but I also want you all to feel comfortable stopping me at any time and asking questions as I go along. Data management is a multi-faceted topic and I don’t want you to feel like you have to remember your ?’s til I’m done yakking then say “Remember that slide you had up 20 minutes ago?” I also recognize that people are at varying levels of understanding of the issues surrounding data management. In reality, everyone is new to this. I understand not everyone reads data mgmt needs assessments for fun. Please don’t be afraid to ask me to clarify anything.
I don’t want to get bogged down in terminology & definitions, but I do want to make sure that I’m not speaking a different dialect or even a different language up here so I’ve outlined a few terms where I thought it might be useful to have some clarification.
First, there’s “data”. You would not believe how many definitions you are for such a tiny word. This one used to be my favorite definition and was the one we used in our research data management needs survey we conducted last Fall. It’s adapted from the DISC-UK DataShare Report and I like it because 1) it’s short and 2) it doesn’t overtly align itself to a particular discipline or data format. It can be textual, images, videos, computer models… it’s all data. And when we’re talking data services, at least at UW, we’re mostly looking at supporting digital data services.
Even with this definition some folks (usu Hum) don’t see what they do as “data”. So I’ve added another piece to my favorite definition.
This is adapted from an MIT Libraries definition and I like it because it adds the variety of processes that can be used in the collection of data, as well as specifically stating that it is discipline agnostic. I don’t know if I would have gotten more responses from our Humanities colleagues on our survey if I had added this to the definition but when we get around to doing our focus groups with those researchers, I will ask them.
Now there are many processes that data goes through. I already mentioned collection, but just as there is a lifecycle associated with research, there is also a lifecycle associated with data. There are a multitude of data lifecycle models out there. In essence, data management pertains to the various processes involved in managing data through the entire data lifecycle – from planning and collection, all the way through to preservation and archiving.
This is not my favorite term but one hears it so much these days, I feel I need to talk about it.
Many people refer to big data as data that are high volume, high velocity, and/or high variety information assets that require new forms of processing for decision making and insight.
Large amounts of data (gigabytes, petabytes, yottabytes)
Highly complex sets of data / flat schemas, few complex interrelationships
Loosely structured data… or highly structured
Technology that handles large and complex data sets
Process for analyzing large and complex data sets
Data sets that can generate insights previously impossible
Availability of massive amounts of data
In short, “big data” can mean any # of things, which is why I don’t use the term. So moving on.
This is the term that probably requires the most explanation and you will probably most frequently hear it used in conjunction with the previous term because this is usually what “big data” is not and this graph actually explains it pretty well.
The vertical axis (the up and down line) is Frequency of Use. The horizontal axis (side to side) is the total inventory of data – everything, all collected data. The green part represents datasets that are popular, widely used and well managed (think of all the climate data collected and maintained over a hundred years by the National Climate Data Center). The yellow part represents datasets that are less frequently used and are managed in some informal manner (maybe on departmental shared network folders). The red part – that is the long tail. It’s data that is rarely used and not managed in any kind of organized fashion. And it’s estimated that it’s 80-85% of all data collected.
It’s that red part where many organizations tend to focus with data services because that’s where the needs are greatest. It’s the data that a researcher collected 10 yrs, 5yrs, 1 yr ago that’s sitting on a floppy, a CD or a thumb drive in, or worse, under a researcher’s desk. You may notice that size of the dataset is not represented on this graph. Size is not a factor in determining if a data set falls into the long tail.
I try to avoid acronyms but after the first ten times, even I get tired of saying research data management over and over.
Don’t think I need to define RDM any further since I already specified research data in my definition of data and defined data management.
IR – Central location for storing an institution’s digital assets and intellectual outputs (e.g., MSU’s ScholarWorks)
DR - Repository specifically designed for storage and access to data sets. Can be part of an IR.
DMP - a document outlining how a researcher plans to manage data during and after a research project including how it will be organized, maintained and shared.
Alright, definitions done. On with the meat of it. So why is data management such a hot topic. Why do we even need to do anything differently than we’ve been doing in the past? I’m going to break things down by the different players involved.
“As long as empirical research has existed, researchers have been doing “data management”
in one form or another. However, funding agency mandates for doing formal data management are
relatively recent.
1998 – NSF instituted DMP requirement
2003 – NIH implemented data sharing policy
2011 – NSF more strongly enforced DMP requirement
2013 – NSF changed merit review criteria for grant proposals to allow inclusion of datasets (Jan?); OSTP mandate for public access to federally funded research (Feb); OMB mandate for government Open Data (May); NIH enforcing public-access policy
http://grants.nih.gov/grants/policy/data_sharing/
http://www.nsf.gov/pubs/policydocs/pappguide/nsf13001/gpg_sigchanges.jsp
http://scholarlykitchen.sspnet.org/2013/02/25/expanding-public-access-to-the-results-of-federally-funded-research-first-impressions-on-the-us-governments-policy/
http://www.federalnewsradio.com/513/3316130/White-House-mandates-open-data-releases-new-tools
http://www.nature.com/nm/journal/v19/n1/full/nm0113-3.html
By providing support for data management, universities increase the competitiveness of their researchers for obtaining grants
Maximize potential of researchers as they can reuse data already collected by others.
Don’t think I need to say anything extra about that third point.
Encourages innovation & discovery by allowing researchers to think of research questions in new ways using existing data
As it gets harder and harder to obtain dollars for research, researchers are under increasing scrutiny to be able to verify their research. By following data mgmt best practices, you can produce your data and the associated documentation if needed to verify and reproduce your research.
Heather Piwowar and friends published research in 2007 showing that publications with publicly available associated data had a 69% increase in citations.
Let me tell you a story about a Nursing faculty member I interviewed as part of our RDM survey and follow-up interviews project this year. The week I went to interview her, she had just been told that the IT folks could not recover the over 30 years of research she had been saving on the departmental server when the hard drive failed. And when I say 30 years of research, I mean all her papers, her data, her codebooks, her scripts. Everything. Gone. She did not have this research anywhere else because she was under the assumption that it was being backed up.
On that last point, planning for data mgmt tasks at the beginning of the research process is a lot less time consuming then doing the data forensics after the research project is over, over 5, 10, 15 years down the road.
Alright, so why am I including libraries in this.
Earlier this year the Digital Preservation Network (or DPN) was formed and the UW became a member and while not focused solely on data, its mission is to “ensure long-term preservation of the complete digital scholarly record”. Not just ejournal articles and ebooks, the COMPLETE digital scholarly record and data certainly is part of that. And if you’re wondering what this has to do with libraries, look at that last part of the mission statement again. Take out “digital” and isn’t that why academic libraries were born?
Data is recognized as a valuable scholarly output. The NSF made that even more explicit in October of last year when it made this change to it’s Proposal Award & Policies Guide.
If libraries don’t step up to the plate and provide data management support, everybody is going to try to figure out a way to do it themselves and that meets their individual needs. To put it in perspective, imagine if every department on campus was maintaining the books and journals in their own subject areas. This is what Libraries are supposed to do… it’s why we’re here.
And now it’s time to take those skills librarians have always used to do those same things we’ve always done for traditional scholarly outputs and adapt them to meet scholarly data needs. Skills like how to organize information, metadata creation, providing access to information. Reference skills in particular are key: ability to liaise, to communicate across disciplines, to refer, consult, to teach.
Off my librarian soapbox… for now.
There has been a lot of work done in this area over the last several years. In order to get to the next section, about blazing your own trail, I thought it might be helpful to look at what’s already been done. There have been several data management needs assessments, there are some existing programs to look at, and a lot of useful tools have been developed to help support data mgmt needs.
In my former life, I was an assessment librarian so it does my heart good to see so many folks out there that have been doing needs assessment for data management. I’ve listed a few of my favorites here. I’m extremely impressed with what Purdue has done with their Data Curation Profiles and they have now created a directory of profiles from not only Purdue, but profiles submitted by other institutions, as well. I’ve mentioned that we did a survey & interviews recently, though we haven’t yet published our results. I did get to present our preliminary findings at a conference recently with Georgia Tech, Cornell & Purdue and though there were differences in our methodologies, populations and findings, there are some needs that keep coming up across multiple assessments.
Wide variety of data types, wide variety of file sizes
Wide variety of data types, wide variety of file sizes
It is centered in the Research Department of the Purdue University Libraries. D2C2 is comprised of four core researchers who work closely with subject specialist liaisons in discipline areas throughout the Libraries
3 FTE who work with subject librarians
An open source tool helping researchers document, manage, and archive their tabular data, DataUp operates within the scientist's workflow and integrates with Microsoft® Excel.
tool for helping people identify and locate online repositories of research data
rate the current state of the researcher’s data management practices. the system compares the information collected during the data interview process with these data management best practice statements. a framework for comparing and improving departmental data management practices
Alright, so we’ve talked about why data management is important, what’s been done in the area so far, let’s walk forward on how to provide support here at MSU.
Let’s start with your strategic plan because you already have objectives listed there where parts of a data services program would fit in nicely
Develop a separate RDM strategic plan
I won’t go into the whole strategic planning process… there are several ways to go about it. UW Libraries uses the Balanced Scorecard system for its strategic plan. The Data Services Team and I have been working on a logic model to help us develop our programmatic strategic plan.
Here are a few things to consider.
You already have some starting points. Look at the MSU strategic plan.
Data management isn’t just important for current researchers, but also for future researchers, as well. At UW, we are developing data management learning opportunities for librarians, faculty & students. Consider the integration of data literacy into grad research methods courses.
D1.2 specifically mentions measuring achievement in this objective thru peer-reviewed publications and journal citations. I would suggest including in here other alternative metrics such as data set downloads and citations. Reuse of existing datasets for new research.
D2 Sounds like you are already on your way with your recent release of the IR ScholarWorks. If so desired, you can also use your IR to support data management by allowing for the deposit of data sets in your IR.
Now I’m spending the rest of my day after this presentation talking with different groups on campus so I can’t even begin to make any specific recommendations, but here are a few ideas. Some possible roles for campus IT.
When I say active data storage, I’m talking about storage during the phase of research where data is being actively collected, accessed, manipulated and shared among collaborators. As opposed to the final version of a data set that is preserved for future reuse.
Here are a few ideas for Office of Research.
At UW we’re working with our Office of Sponsored Programs on that last bullet point. In a recent meeting, we talked about looking into the feasibility of coordination between my shop and OSP when a researcher is submitting a grant proposal to a funding agency that requires a DMP.
I’ve already mentioned how librarians have certain skillsets that are conducive to data management support. Here is just a smattering of possible services they can provide. At the UW we provide all of these, though not at as high a level as I would like, but we’re working on that.
And that’s something to keep in mind, as well. You don’t have to come out of the gate with everything polished. We sure didn’t. When the NSF announced it was enforcing the DMP mandate, I threw up a quick and dirty LibGuide on DMPs. The next year I rec’d a Friends of the Libraries grant to develop a more robust data management guide. It’s better, but it’s still not the site I want it to be.
Consider what can be done at a broad university level, as well, not just by individual groups on campus. Here are a few suggestions on that front.
In short, research data management services works well with the saying “It takes a village.” There are lots of parts to be played and there are some units more suited for fulfilling certain roles than others. There are many things that can be done to support data management. Some are low hanging fruit, some you might need a stepladder.
The key is to do something. Because doing nothing really isn’t an option.