The document provides an introduction to data management for librarians, outlining key concepts such as the research data lifecycle, challenges in managing digital data over time, best practices for organizing, documenting, and storing data, and resources for data management support. Common problems include difficulty locating, accessing, and understanding data in the long run without proper planning and preservation strategies. The role of librarians is to educate researchers on best practices and provide support and training resources.
University Of Petroleum And Energy Studies is the first Indian University which has implemented SAP.SAP for HE&R has been able to provide UPES with real time access to student data ,seamless integration of data across all business units, a single portal with complete and controlled access to the entire organization's data, information and knowledge resourses.
University Of Petroleum And Energy Studies is the first Indian University which has implemented SAP.SAP for HE&R has been able to provide UPES with real time access to student data ,seamless integration of data across all business units, a single portal with complete and controlled access to the entire organization's data, information and knowledge resourses.
This slideshow was used in an Introduction to Research Data Management course taught for the Mathematical, Physical and Life Sciences Division, University of Oxford, on 2016-02-03. It provides an overview of some key issues, looking at both day-to-day data management, and longer term issues, including sharing, and curation.
A short presentation to introduce the idea of research data and why looking after data is important.
Notes to accompany the slides will be made available via www.lib.cam.ac.uk/dataman.
Data management: The new frontier for librariesLEARN Project
Presentation at 3rd LEARN workshop on Research Data Management, “Make research data management policies work”, by Kathleen Shearer, COAR, CARL/ABCR, RDC/DCR, ARL, SSHRC/CSRH.
Slides of the course on big data by C. Levallois from EMLYON Business School. For business students. Check the online video connected with these slides.
-> Basic definition of data and related concepts that you need to characterize a dataset.
Data-Ed Online: Data Management Maturity ModelDATAVERSITY
The Data Management Maturity (DMM) model is a framework for the evaluation and assessment of an organization's data management capabilities. The model allows an organization to evaluate its current state data management capabilities, discover gaps to remediate, and strengths to leverage. The assessment method reveals priorities, business needs, and a clear, rapid path for process improvements. This webinar will describe the DMM, its evolution, and illustrate its use as a roadmap guiding organizational data management improvements.
Takeaways:
Our profession is advancing its knowledge and has a wide spread basis for partnerships
New industry assessment standard is based on successful CMM/CMMI foundation
Clear need for data strategy
A clear and unambiguous call for participation
About the Speakers
Gartner: Seven Building Blocks of Master Data ManagementGartner
Gartner will further examine key trends shaping the future MDM market during the Gartner MDM Summit 2011, 2-3 February in London. More information at www.europe.gartner.com/mdm.
Gartner: Master Data Management FunctionalityGartner
Gartner will further examine key trends shaping the future MDM market during the Gartner MDM Summit 2011, 2-3 February in London. More information at www.europe.gartner.com/mdm
The digital universe is booming, especially metadata and user-generated data. This raises strong challenges in order to identify the relevant portions of data which are relevant for a particular problem and to deal with the lifecycle of data. Finer grain problems include data evolution and the potential impact of change in the applications relying on the data, causing decay. The management of scientific data is especially sensitive to this. We present the Research Objects concept as the means to indentify and structure relevant data in scientific domains, addressing data as first-class citizens. We also identify and formally represent the main reasons for decay in this domain and propose methods and tools for their diagnosis and repair, based on provenance information. Finally, we discuss on the application of these concepts to the broader domain of the Web of Data: Data with a Purpose.
This slideshow was used in an Introduction to Research Data Management course taught for the Mathematical, Physical and Life Sciences Division, University of Oxford, on 2016-02-03. It provides an overview of some key issues, looking at both day-to-day data management, and longer term issues, including sharing, and curation.
A short presentation to introduce the idea of research data and why looking after data is important.
Notes to accompany the slides will be made available via www.lib.cam.ac.uk/dataman.
Data management: The new frontier for librariesLEARN Project
Presentation at 3rd LEARN workshop on Research Data Management, “Make research data management policies work”, by Kathleen Shearer, COAR, CARL/ABCR, RDC/DCR, ARL, SSHRC/CSRH.
Slides of the course on big data by C. Levallois from EMLYON Business School. For business students. Check the online video connected with these slides.
-> Basic definition of data and related concepts that you need to characterize a dataset.
Data-Ed Online: Data Management Maturity ModelDATAVERSITY
The Data Management Maturity (DMM) model is a framework for the evaluation and assessment of an organization's data management capabilities. The model allows an organization to evaluate its current state data management capabilities, discover gaps to remediate, and strengths to leverage. The assessment method reveals priorities, business needs, and a clear, rapid path for process improvements. This webinar will describe the DMM, its evolution, and illustrate its use as a roadmap guiding organizational data management improvements.
Takeaways:
Our profession is advancing its knowledge and has a wide spread basis for partnerships
New industry assessment standard is based on successful CMM/CMMI foundation
Clear need for data strategy
A clear and unambiguous call for participation
About the Speakers
Gartner: Seven Building Blocks of Master Data ManagementGartner
Gartner will further examine key trends shaping the future MDM market during the Gartner MDM Summit 2011, 2-3 February in London. More information at www.europe.gartner.com/mdm.
Gartner: Master Data Management FunctionalityGartner
Gartner will further examine key trends shaping the future MDM market during the Gartner MDM Summit 2011, 2-3 February in London. More information at www.europe.gartner.com/mdm
The digital universe is booming, especially metadata and user-generated data. This raises strong challenges in order to identify the relevant portions of data which are relevant for a particular problem and to deal with the lifecycle of data. Finer grain problems include data evolution and the potential impact of change in the applications relying on the data, causing decay. The management of scientific data is especially sensitive to this. We present the Research Objects concept as the means to indentify and structure relevant data in scientific domains, addressing data as first-class citizens. We also identify and formally represent the main reasons for decay in this domain and propose methods and tools for their diagnosis and repair, based on provenance information. Finally, we discuss on the application of these concepts to the broader domain of the Web of Data: Data with a Purpose.
Guiding researchers to the web tools they need: The rationale behind a Web to...ALISS
Guiding researchers to the web tools they need: The rationale behind a Web tools for researchers’ guide
Presentation by Miggie Pickton, University of Northampton
Enhancing AT through ID techniques handoutsnorthavorange
Link to slide cast of presentation: http://www.slideshare.net/northavorange/enhancing-at-through-id-techniques
Rehabilitation professionals classify
needs and identify workable solutions
for people with disabilities on a daily
basis. Unfortunately, many of those
solutions never get beyond the one
person for whom they are made. The
ability to develop solutions that have a
more universal appeal and application
would be a useful tool in the AT
provider’s “tool belt.” Industrial
Designers face such challenges as
a matter of practice. This workshop
will educate participants with regard
to tools and techniques used by
Industrial Designers that can help the
“one-of-a-kind” solutions grow into a
more universally marketable solution.
Describes what Enterprise Data Architecture in a Software Development Organization should cover and does that by listing over 200 data architecture related deliverables an Enterprise Data Architect should remember to evangelize.
Presentation slides for a talk on the implications of open science for research managers, discussing how they might support researchers and areas where Africa-based organisations are performing development. It was presented at the West African Research and Innovation Management Association (WARIMA) conference on January 18, 2023, which was held at MRC Gambia at LSHTM Fajara.
A case study of challenges encountered when acquiring and curating digital collections. Presented at the Digital Preservation Coalition workshop on April 23rd, 2015.
Building Sustainability: Preserving research data without breaking the bankGarethKnight
An overview of methods for establishing buy-in into digital preservation activities within a university, accompanied by practical examples of how this approach is being performed at the London School of Hygiene & Tropical Medicine
Complying with EPSRC policy: An LSHTM case studyGarethKnight
Overview of LSHTM's approach to complying with EPSRC data management requirements, focussed on security requirements. Presented at Glasgow University on May 8th 2014
Doing research better: The role of meta‐dataGarethKnight
Presentation given by David Leon, Professor of Epidemiology at the London School of Hygiene and Tropical Medicine in January 2012. Subsequently reused at various internal events
Same as it ever was? Significant Properties and the preservation of meaning o...GarethKnight
Presentation describing the methodology adopted by the JISC funded InSPECT project to determine the set of technical properties that are significant for preservation over time
Who Decides? Reinterpreting archival processes for the management of digital ...GarethKnight
Management of digital records can benefit from the contribution of digital curators and archivists. The presentation outlines the efforts of the PEKin project at King's College London to develop a management strategy that combines these disparate skillsets
Establishing the significant properties of digital research
Data Management for Librarians: An Introduction
1. Data Management
for Librarians:
An Introduction
February 19th 2013
Gareth Knight
Manager
RDM Support Service
2. What is Data?
“Data are facts, observations or experiences on which an argument, theory or
test is based. Data may be numerical, descriptive or visual. Data may be raw or
analysed, experimental or observational.“
http://research.unimelb.edu.au/integrity/conduct/data/review
May originate from various sources:
Primary and/or secondary
May contain different content:
Quantitative and/or qualitative
May be expressed in different forms:
Datasets, still images, audio‐video, audio recordings, interactive resources
May be held in a number of variations:
Raw, cleaned, anonymised/pseudomised, analysed
May be encoded in different formats:
MS Excel, TIFF, MPEG2, STATA, FoxPro
What type of data do you have at home?
3. Data in the Research Lifecycle
Brainstorm
Finalise & Develop
submit Proposal
Write‐up
Plan Project
Results
Perform
Research
4. Data in the Research Lifecycle
Brainstorm
Finalise & Develop Produce Data
Develop
submit Proposal Management
Proposal
Plan
Write‐up
Plan Project
Results
Perform
Research
5. Data in the Research Lifecycle
Brainstorm
Finalise & Develop
submit Proposal
Write‐up Plan
Results Project
Perform
Perform
Research
Research
Create /
Share Reuse
Describe Analyse
Store
6. Data in the Research Lifecycle
Share Brainstorm
Finalise &
Finalise & Develop
submit
submit Proposal
Archive
Write‐up Plan
Results Project
Perform
Perform
Research
Research
Create /
Share Reuse
Describe Analyse
Store
7. What is Data Management?
1. Plan
• Determine requirements
• Identify risks & opportunities
• Decide approach
2. Implement
3. Monitor
• Evaluate approach
• Change approach/perform
corrective action
4. Evaluate
• Is it Fit for purpose?
• What additional action is
needed?
‘Benign neglect’ and Poorly‐made decisions in short‐term will have long‐term implications
8. Short-term decisions
with long-term implications
Software products File formats & standards
Data organisation & labelling Quality Controls
9. Why does data need to be managed?
Ensure data can be located Enable analysis
Interesting
paper. Where’s
the data?
Ability to understand for Enable sharing & validation
current and future need
10. Why does data need to be managed?
Ensure data can be located Enable analysis
Comply with Funder &
School requirements Interesting
paper. Where’s
the data?
Ability to understand for Enable sharing & validation
current and future need
11. Researcher Challenges
Issues/challenges encountered when creating, managing,
and sharing research data (web survey results)
Other challenges
• Database creation & management
• Storage of physical questionnaires
Response Type
• Lack of time
Multiple choice • Software instability (particularly
checkbox + free NVivo)
text for other • Ability to enter & access data at
challenges different locations
12. Training Needs
Interest in training on topics related to data management (web survey results)
Note:
Graph omits percentages for other responses
(None, slight, moderate, no opinion)
14. RDM Support Service
Role of Library staff
Provide first point of contact
Help researchers to express
requirements & needs
Direct to potential solution (staff,
website)
Contribute to training activities
Incorporate data considerations
into teaching
Location of Library staff
15. Data Access Over Time
digital vs. analogue
“traditionally, preserving things meant keeping them unchanged;
however … if we hold on to digital information without
modifications, accessing the information will become increasingly
more difficult, if not impossible.”
Su‐Shing Chen, 2001
+ + + =
data computer OS application information
content
16. Change in Process over Time
Intel PC, 2000
Mac laptop, 2006
X64 Ubuntu laptop, 2010
operating software information
hardware
system application content
17. Change in Process over Time
Intel PC, 2000
Mac laptop, 2006
X64 Ubuntu laptop, 2010
operating software information
hardware
system application content
18. Task
• Select two of the following problems when managing digital data:
1. Difficulty locating data
2. Difficulty accessing media
3. Difficulty rendering data in an understandable form
4. Difficulty recreating data as originally intended
5. Difficulty understanding information content
6. Uncertain provenance
Consider the following questions:
a. In what circumstances will the chosen problem occur?
b. What consequences may occur if the problem occurs (e.g. financial
implications)
c. How could you ensure that the problem doesn’t occur?
d. What could you do to resolve the problem after it has
occurred? (Can direct to someone for help)
19. 1. Difficulty Locating Data
Problem
“I created some data 5 years ago. Where is it?”
“I’ve lost my original disk. Do I have the data elsewhere?
Scenarios & Reasons
Loss of storage media
Lots of data stored in many locations
Vague filenames make it difficult to locate
(Potential) Solutions
Preventative:
• Copy data to several storage devices – increase likelihood
of finding it
Post event:
• Find better discovery software?
• Attempt to recreate content?
20. 2. Difficulty accessing Media
Problem
“How do I access this old media?”
“Why can’t I read this disk?”
Scenario & Reasons
Media obsolescence
Physical deterioration & failure
(Potential) Solutions
Preventative:
• Copy data to several storage devices
• Transfer data to new storage media on obsolescence / every 3 years
• Deposit data into a data archive and/or copy to server
Post event:
• Data recovery software
21. Potential Storage Locations
Pros:
Local machine & Cheap, high capacity storage, fast access
Storage
Cons:
Lack of support; potential for theft, loss, or
damage
Pros: Recommended
Academic Storage
Automatic monitoring & backup, multiple
Systems redundancy, remote access, secure (if required)
Cons:
Limited space allocation, Not always accessible
overseas
Third party service Pros:
providers Automated backup, accessible in diff. countries
(usually)
Cons:
Security concerns, ownership concerns, services
can close account at any time
http://www.flickr.com/photos/m0n0/4479450696/
22. 3. Difficulty Rendering Data
Problem
“How can I view data?
“Where do I find software to access my data?”
Scenarios & Reasons
Software obsolescence
New software use different decoding method
(Potential) Solutions
Preventative:
• Transform data to new formats (format conversion strategy)
• Maintain original machine and software to access content (computer museum)
Post event:
• Track down original software product
• Emulate original environment (emulation/virtualisation)
23. Choosing File Formats
Creation Preservation Dissemination
Content Type Preferred Format Acceptable Alternatives
Documents Rich Text Format Microsoft DocX
Open Document Format
Still Images TIFF PNG,
JPEG 2000 (uncompressed) RAW
Audio Wav format MP3
AIFF
FLAC
AudioVideo MPEG2,
MPEG4
When working with multiple copies, decide which is the master copy
24. 4. Difficulty Maintaining
Authenticity
Problem
“Why does my data look different?”
Scenarios & Reasons
New version of software application use different
decoding method
Different software application in use
(Potential) Solutions
Preventative:
• Determine significant properties that should be maintained
• Maintain original machine and software to access content (computer museum)
Post event:
• Emulate original environment (emulation/virtualisation)
25. 5. Difficulty Understanding
Content
Problem
“Where was this information created?
Why did the creator make this decision?
“What does this value mean?”
“How does this data relate to other content?”
Scenarios & Reasons
Memory fails – cannot remember decisions made
Disorganised and poorly labelled data
Lack of documentation
(Potential) Solutions
• Organise data (Chronology, Experiment type,
location, content type) Does a Rosetta stone exist
• Adopt labelling conventions for your data?
• Documentation
26. Filename conventions
• Consider the elements that will help you to organise and locate
content
– E.g. Participant ID, site of data collection,date of data collection
• Consider how data files and directories may be organised & sorted
– 001, 002, 003, 004, can be used for sequential files
– YYYY‐MM‐DD (2012‐12‐04) useful for organising by date (use year first)
• Identify different versions of content in filename (and in content)
– Creation date (YY‐MM‐DD)
– Version/draft number
• Consider how your filenames will look to others
– Avoid spaces ‐ ‘My file.pdf’ becomes ‘My%20file.pdf’ on the web
– Avoid capitalisation ‐ Alters file sorting & CAUSES HEADACHES!
Golden Rule: Be Consistent
27. Data Documentation
What would someone want to know if they
were looking at your data the first time?
1. What is the context of creation?
• Why did you create it? For what purpose?
• What methodology did you use? What assumptions were made?
• Who is the target audience?
2. Collection and set of files:
• What information does each file contain?
• When was it created?
• By whom?
• What actions were performed?
• How does the data contained in the collection relate to each other?
3. Individual components
• What is the meaning of this word/column/row, etc.?
• How are these items measured?
• What are the boundaries of the measurement?
28. 6. Uncertain Provenance
Problem
1. “When was the data created and/or modified?”
2. “Who created/modified the data?”
3. “Why was it created and/or modified?
Scenarios & Reasons
• Lack/Loss of trust in information content
• Reluctance to use information content
(Potential) Solutions
Preventative:
• Limit update to authorised users only
• Store change history
• Keep each version
Post event:
• Locate data creator & editor?
29. Things to Recommend
Advise researchers to:
1. Choose an appropriate storage location and create backups
2. Organise data in a consistent and logical manner
3. Document the data and information content (as well as structure)
4. Consider how you will ensure that information can be accessed in
the long‐term
5. Consider potential for data sharing and ensure it is performed with
consideration of ethics
30. A Few Good References
• Digital Curation Centre
http://www.dcc.ac.uk/resources
• MANTRA – Data Management training for PhD students
http://datalib.edina.ac.uk/mantra/
• UK Data Archive – Managing and Sharing Data
http://www.data‐archive.ac.uk/media/2894/managingsharing.pdf
• Cambridge University – RDM Guidance
http://www.lib.cam.ac.uk/dataman/index.html
• Australia National Data Service
http://ands.org.au/resource/data‐management‐planning.html
• LSHTM Research Data Management Support Service
• http://blogs.lshtm.ac.uk/rdmss/