SlideShare a Scribd company logo
MAKING SENSE OF
A COLLECTION
This work is licensed under a
Creative Commons Attribution 2.0
UK: England & Wales License
Gareth Knight
London School of Hygiene & Tropical Medicine
gareth.knight@lshtm.ac.uk
Getting Started in Digital Preservation
The Information Technologists, London
23rd April 2015
Case Studies
National service that preserved
research, teaching and learning
resources in arts & humanities
between 1996 - 2008
Institutional RDM service that
helps LSHTM researchers to
curate & preserved research data
in public health & tropical
medicine
Need for Digital Preservation
Data Storage
media
Computing
device
Operating
System
Software
application
Information
+ + + + =
Deteriorate & change
over time
Obsolete & replaced
over time
What does this
mean?
“Digital information lasts forever – or five years, whichever comes first”
Jeff Rothenberg, 1997
Climb the preservation mountain
“the series of managed activities necessary to ensure continued
access to digital materials for as long as necessary.”
Neil Beagrie and Maggie Jones (2008)
Beagrie & Jones: http://www.dpconline.org/advice/preservationhandbook/introduction/definitions-and-concepts
Caplan: http://journals.ala.org/ltr/article/view/4224/4809
Modified version of
Caplan’s
Preservation
Pyramid
Content can
be used
Content is
understandable
Content is
rendered accurately
Bits are stored exactly
Its value is recognised & it is acquired
Data exists
Digital Detectives
• Digital preservation often a process
of investigation & deduction
• Resource intensive
– Time
– Physical space
– Hardware/software costs
• How much effort are you willing to
make? What is good enough?
https://www.flickr.com/photos/ollieolarte/3028314931
Acquire data
Acquisition depends upon object
to be preserved & how stored
• Media: Floppy disk, CD/DVD, ZIP/Jaz disk,
hard disk, solid state devices, etc.
• Electronic: Email, cloud services
Invest in infrastructure to support
preservation process
• Computer hardware
• Media readers
• 3rd party services can provide
advice and hardware rental
where needed
https://www.flickr.com/photos/adactio/13127134455
Case Study: AHDS History dataset
Deposited by children of noted researcher in
2006 & processed by GK
Documentation:
Accompanying notes in researcher’s
handwriting described a history DB they were
working on in 1988.
Challenges:
• 5.25" disk drive was available
• Disk was failing, but managed to create a
complete copy on 5th attempt
• Disk analysis revealed text content…
The author's short stories, not a dataset!
Result:
Not accessioned, but children were pleased
http://www.old-computers.com/museum/computer.asp?st=1&c=810
History database created on a Shelton
Instruments Sig-Net, running CP/M
2.2.operating system in 1988 & saved to
5.25” disk
Check completeness
What does the creator intend to
provide?
• Data
• Documentation
• Research instruments
What have they actually provided?
• Some data
• Creation software & random files
• Personal music collection?
• Request a file manifest:
– Filename
– Description
– Format
https://www.flickr.com/photos/kyngpao/14455832915
Case Study: Early English Books Online
Collection of 125,000 early printed books
deposited for preservation:
• XML files, scanned TIFFs & PDFs for each
page
• Well structured & labelled
Problems:
• Hard disk was failing
• XML output from Content Management
system - incomplete header & missing
schema
• 30% of files referenced in XML were missing
Solution:
• Obtained schema & missing files (but took a
long, long time)
Render data
Decode file format
Reflect tools & software available at
point of creation:
• Information content
• Contextual information
(documentation/metadata)
Analyse organisation structure
Intrinsic relationships important for
decoding multi-file objects
• Filenames & directory structure
Solution
• Specialist software may be required to
access
• Liaise with data creators
https://www.flickr.com/photos/hawksanddoves/83818392
How many locks do you have to get
through to reach your destination?
Case Study: Scientific dataset
USB stick of LSHTM dataset containing:
• FCS2.0 - tabular data outlining experiments
to count cells, sort them & identify
biomarkers
• Leica Experiment Collection - .lei library file &
associated images with embedded metadata
Challenges:
• Domain & proprietary formats
– FITS (file) provides limited info on .lei
– FCS not recognised
• Complex relationship in Leica experiment -
recorded in filename & internal manifest
(partial) Solution
• Store files as-is
• Obtain text output of FCS files
• Analyse using open source tools
Understand data
• 17th-18th Century Enlightenment
built on information sharing
• Openness & transparency essential
for academic research
– Evidence of activity
– Open to scrutiny & replication
• Can you establish who, what, where,
when & how?
• How much documentation can only
be found in the data creator’s head?
https://www.flickr.com/photos/domiriel/5234590796
Case Study: Adolphe Appia
Warwick Uni. School of Theatre Studies modelled
performance space of Appia's Festspielhaus at
Hellerau.
Collection deposited on several CDs:
• Digitised photographs of 1991 performance
• VRML 3D models of performance space
• Videos of 3D models in .mov format
• Documentation & Metadata
Problem
• Image metadata ‘disappeared’ on transfer
Solution:
• Descriptions added to file attributes, which were
being removed when written to disc
• Output file attributes to text file
• Compressed files and copied to disk© King's Visualisation
Lab, King's College
London
http://www.kvl.cch.k
cl.ac.uk/appia.html
Final thoughts
1. Analyse your needs & capabilities
– What can you do with existing resources?
– What future investment is possible?
2. Inform users of your expectations from
the outset
– File formats
– Documentation
– File structure & naming conventions
– Permissions
3. Help them to fulfil expectations
– Advice and guidance
http://www.keepcalm-o-matic.co.uk/p/keep-calm-and-curate-41/

More Related Content

What's hot

Making Materials Findable at the State Library of Victoria
Making Materials Findable at the State Library of VictoriaMaking Materials Findable at the State Library of Victoria
Making Materials Findable at the State Library of Victoria
Alan Manifold
 
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
DuraSpace
 
Linked Data: thinking big, starting small
Linked Data: thinking big, starting smallLinked Data: thinking big, starting small
Linked Data: thinking big, starting small
Peter Neish
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
DuraSpace
 
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
WARCnet
 
Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Peter Neish
 
Harvesting and semantically tagging media releases from political websites us...
Harvesting and semantically tagging media releases from political websites us...Harvesting and semantically tagging media releases from political websites us...
Harvesting and semantically tagging media releases from political websites us...
Peter Neish
 
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813
jeffreylancaster
 
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectiveGIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
Peter Löwe
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
Scottish Library & Information Council (SLIC), CILIP in Scotland (CILIPS)
 
Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)
Sergio Fernández
 
Linked Data
Linked DataLinked Data
Linked Data
Anja Jentzsch
 
Aggregation Using Linked Data – LOCAH Project Experiences
Aggregation Using Linked Data – LOCAH Project ExperiencesAggregation Using Linked Data – LOCAH Project Experiences
Aggregation Using Linked Data – LOCAH Project Experiences
Adrian Stevenson
 
PhD Thesis Digitisation Project
PhD Thesis Digitisation ProjectPhD Thesis Digitisation Project
PhD Thesis Digitisation Project
Lorna Campbell
 
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
National Information Standards Organization (NISO)
 
IPTC Semantic Web Working Group 2011 Autumn Working Group
IPTC Semantic Web Working Group 2011 Autumn Working GroupIPTC Semantic Web Working Group 2011 Autumn Working Group
IPTC Semantic Web Working Group 2011 Autumn Working Group
Stuart Myles
 
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
jeffreylancaster
 
Bingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationBingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman Presentation
WARCnet
 
Historical Photographs of China - the journey towards sustainability and utility
Historical Photographs of China - the journey towards sustainability and utilityHistorical Photographs of China - the journey towards sustainability and utility
Historical Photographs of China - the journey towards sustainability and utility
Simon Price
 
Bibliosight (UKCoRR presentation)
Bibliosight (UKCoRR presentation)Bibliosight (UKCoRR presentation)
Bibliosight (UKCoRR presentation)
Nick Sheppard
 

What's hot (20)

Making Materials Findable at the State Library of Victoria
Making Materials Findable at the State Library of VictoriaMaking Materials Findable at the State Library of Victoria
Making Materials Findable at the State Library of Victoria
 
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...
 
Linked Data: thinking big, starting small
Linked Data: thinking big, starting smallLinked Data: thinking big, starting small
Linked Data: thinking big, starting small
 
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
10-31-13 “Researcher Perspectives of Data Curation” Presentation Slides
 
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
Tuesday 5 May 2020: Contextualizing and engaging with Web domains, Valérie Sc...
 
Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011
 
Harvesting and semantically tagging media releases from political websites us...
Harvesting and semantically tagging media releases from political websites us...Harvesting and semantically tagging media releases from political websites us...
Harvesting and semantically tagging media releases from political websites us...
 
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813
 
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectiveGIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspective
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)Incubating Apache Linda (ApacheCon Europe 2012)
Incubating Apache Linda (ApacheCon Europe 2012)
 
Linked Data
Linked DataLinked Data
Linked Data
 
Aggregation Using Linked Data – LOCAH Project Experiences
Aggregation Using Linked Data – LOCAH Project ExperiencesAggregation Using Linked Data – LOCAH Project Experiences
Aggregation Using Linked Data – LOCAH Project Experiences
 
PhD Thesis Digitisation Project
PhD Thesis Digitisation ProjectPhD Thesis Digitisation Project
PhD Thesis Digitisation Project
 
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
Wetzel, Baish, Johnson, Reich, and Grant "Digital Preservation: Current Efforts"
 
IPTC Semantic Web Working Group 2011 Autumn Working Group
IPTC Semantic Web Working Group 2011 Autumn Working GroupIPTC Semantic Web Working Group 2011 Autumn Working Group
IPTC Semantic Web Working Group 2011 Autumn Working Group
 
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731ACS Summer Institute - Emerging Roles of Librarians - 14_0731
ACS Summer Institute - Emerging Roles of Librarians - 14_0731
 
Bingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman PresentationBingham, De Wild & Aasman Presentation
Bingham, De Wild & Aasman Presentation
 
Historical Photographs of China - the journey towards sustainability and utility
Historical Photographs of China - the journey towards sustainability and utilityHistorical Photographs of China - the journey towards sustainability and utility
Historical Photographs of China - the journey towards sustainability and utility
 
Bibliosight (UKCoRR presentation)
Bibliosight (UKCoRR presentation)Bibliosight (UKCoRR presentation)
Bibliosight (UKCoRR presentation)
 

Similar to Making Sense of a Digital Collection

Managing Software Selection and Acquisition: From Problem to Solution
Managing Software Selection and Acquisition: From Problem to SolutionManaging Software Selection and Acquisition: From Problem to Solution
Managing Software Selection and Acquisition: From Problem to Solution
suyu22
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
C. Tobin Magle
 
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
DuraSpace
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
MANENDRASINGH30
 
Keynote: Unexpected repurposing
Keynote: Unexpected repurposingKeynote: Unexpected repurposing
Keynote: Unexpected repurposing
labsbl
 
Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...
ARDC
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
The future of the DCC
The future of the DCCThe future of the DCC
The future of the DCC
Chris Rusbridge
 
Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...
Presentation Slides, “Creating Access to Audio & Video Digital Media:  The Va...Presentation Slides, “Creating Access to Audio & Video Digital Media:  The Va...
Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...
DuraSpace
 
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionNavigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Kay Gregg
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica
Jenny Mitcham
 
From Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly CommunicationFrom Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly Communication
Andrew Treloar
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management Ecosystem
ASIS&T
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
rmacneil88
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
ResearchSpace
 
Pitts Library Digitization Initiatives
Pitts Library Digitization InitiativesPitts Library Digitization Initiatives
Pitts Library Digitization Initiatives
jbweave
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsJon Voss
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
Jeroen Rombouts
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
ASIS&T
 

Similar to Making Sense of a Digital Collection (20)

Ji cv6n1
Ji cv6n1Ji cv6n1
Ji cv6n1
 
Managing Software Selection and Acquisition: From Problem to Solution
Managing Software Selection and Acquisition: From Problem to SolutionManaging Software Selection and Acquisition: From Problem to Solution
Managing Software Selection and Acquisition: From Problem to Solution
 
Data Archiving and Sharing
Data Archiving and SharingData Archiving and Sharing
Data Archiving and Sharing
 
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
3.7.17 DSpace for Data: issues, solutions and challenges Webinar Slides
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 
Keynote: Unexpected repurposing
Keynote: Unexpected repurposingKeynote: Unexpected repurposing
Keynote: Unexpected repurposing
 
Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
The future of the DCC
The future of the DCCThe future of the DCC
The future of the DCC
 
Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...
Presentation Slides, “Creating Access to Audio & Video Digital Media:  The Va...Presentation Slides, “Creating Access to Audio & Video Digital Media:  The Va...
Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...
 
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionNavigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica
 
From Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly CommunicationFrom Data to Data: One version of a History of Scholarly Communication
From Data to Data: One version of a History of Scholarly Communication
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management Ecosystem
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
Pitts Library Digitization Initiatives
Pitts Library Digitization InitiativesPitts Library Digitization Initiatives
Pitts Library Digitization Initiatives
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
 

More from GarethKnight

Supporting Open Science in Research
Supporting Open Science in ResearchSupporting Open Science in Research
Supporting Open Science in Research
GarethKnight
 
Building Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankBuilding Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bank
GarethKnight
 
GIS: A project by project prospective
GIS: A project by project prospectiveGIS: A project by project prospective
GIS: A project by project prospective
GarethKnight
 
Complying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case studyComplying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case study
GarethKnight
 
Data Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionData Management for Librarians: An Introduction
Data Management for Librarians: An Introduction
GarethKnight
 
Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support Service
GarethKnight
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
GarethKnight
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐data
GarethKnight
 
Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...
GarethKnight
 
Preservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyPreservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategy
GarethKnight
 
Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...
GarethKnight
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
GarethKnight
 
Digital Forensics in the Archive
Digital Forensics in the ArchiveDigital Forensics in the Archive
Digital Forensics in the Archive
GarethKnight
 
Keep Calm and Curate
Keep Calm and CurateKeep Calm and Curate
Keep Calm and Curate
GarethKnight
 
Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...
GarethKnight
 
Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...
GarethKnight
 
Establishing the significant properties of digital research
Establishing the significant properties of digital researchEstablishing the significant properties of digital research
Establishing the significant properties of digital research
GarethKnight
 

More from GarethKnight (17)

Supporting Open Science in Research
Supporting Open Science in ResearchSupporting Open Science in Research
Supporting Open Science in Research
 
Building Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bankBuilding Sustainability: Preserving research data without breaking the bank
Building Sustainability: Preserving research data without breaking the bank
 
GIS: A project by project prospective
GIS: A project by project prospectiveGIS: A project by project prospective
GIS: A project by project prospective
 
Complying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case studyComplying with EPSRC policy: An LSHTM case study
Complying with EPSRC policy: An LSHTM case study
 
Data Management for Librarians: An Introduction
Data Management for Librarians: An IntroductionData Management for Librarians: An Introduction
Data Management for Librarians: An Introduction
 
Challenges in setting up an RDM Support Service
Challenges in setting up an RDM Support ServiceChallenges in setting up an RDM Support Service
Challenges in setting up an RDM Support Service
 
Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...Research Data Management: What is it and why is the Library & Archives Servic...
Research Data Management: What is it and why is the Library & Archives Servic...
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐data
 
Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...Laying the Foundation: Establishing an institutional RDM Support Service for ...
Laying the Foundation: Establishing an institutional RDM Support Service for ...
 
Preservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategyPreservation Planning: Choosing a suitable digital preservation strategy
Preservation Planning: Choosing a suitable digital preservation strategy
 
Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...Watching the Detectives: Using digital forensics techniques to investigate th...
Watching the Detectives: Using digital forensics techniques to investigate th...
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
Digital Forensics in the Archive
Digital Forensics in the ArchiveDigital Forensics in the Archive
Digital Forensics in the Archive
 
Keep Calm and Curate
Keep Calm and CurateKeep Calm and Curate
Keep Calm and Curate
 
Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...Same as it ever was? Significant Properties and the preservation of meaning o...
Same as it ever was? Significant Properties and the preservation of meaning o...
 
Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...Who Decides? Reinterpreting archival processes for the management of digital ...
Who Decides? Reinterpreting archival processes for the management of digital ...
 
Establishing the significant properties of digital research
Establishing the significant properties of digital researchEstablishing the significant properties of digital research
Establishing the significant properties of digital research
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 

Making Sense of a Digital Collection

  • 1. MAKING SENSE OF A COLLECTION This work is licensed under a Creative Commons Attribution 2.0 UK: England & Wales License Gareth Knight London School of Hygiene & Tropical Medicine gareth.knight@lshtm.ac.uk Getting Started in Digital Preservation The Information Technologists, London 23rd April 2015
  • 2. Case Studies National service that preserved research, teaching and learning resources in arts & humanities between 1996 - 2008 Institutional RDM service that helps LSHTM researchers to curate & preserved research data in public health & tropical medicine
  • 3. Need for Digital Preservation Data Storage media Computing device Operating System Software application Information + + + + = Deteriorate & change over time Obsolete & replaced over time What does this mean? “Digital information lasts forever – or five years, whichever comes first” Jeff Rothenberg, 1997
  • 4. Climb the preservation mountain “the series of managed activities necessary to ensure continued access to digital materials for as long as necessary.” Neil Beagrie and Maggie Jones (2008) Beagrie & Jones: http://www.dpconline.org/advice/preservationhandbook/introduction/definitions-and-concepts Caplan: http://journals.ala.org/ltr/article/view/4224/4809 Modified version of Caplan’s Preservation Pyramid Content can be used Content is understandable Content is rendered accurately Bits are stored exactly Its value is recognised & it is acquired Data exists
  • 5. Digital Detectives • Digital preservation often a process of investigation & deduction • Resource intensive – Time – Physical space – Hardware/software costs • How much effort are you willing to make? What is good enough? https://www.flickr.com/photos/ollieolarte/3028314931
  • 6. Acquire data Acquisition depends upon object to be preserved & how stored • Media: Floppy disk, CD/DVD, ZIP/Jaz disk, hard disk, solid state devices, etc. • Electronic: Email, cloud services Invest in infrastructure to support preservation process • Computer hardware • Media readers • 3rd party services can provide advice and hardware rental where needed https://www.flickr.com/photos/adactio/13127134455
  • 7. Case Study: AHDS History dataset Deposited by children of noted researcher in 2006 & processed by GK Documentation: Accompanying notes in researcher’s handwriting described a history DB they were working on in 1988. Challenges: • 5.25" disk drive was available • Disk was failing, but managed to create a complete copy on 5th attempt • Disk analysis revealed text content… The author's short stories, not a dataset! Result: Not accessioned, but children were pleased http://www.old-computers.com/museum/computer.asp?st=1&c=810 History database created on a Shelton Instruments Sig-Net, running CP/M 2.2.operating system in 1988 & saved to 5.25” disk
  • 8. Check completeness What does the creator intend to provide? • Data • Documentation • Research instruments What have they actually provided? • Some data • Creation software & random files • Personal music collection? • Request a file manifest: – Filename – Description – Format https://www.flickr.com/photos/kyngpao/14455832915
  • 9. Case Study: Early English Books Online Collection of 125,000 early printed books deposited for preservation: • XML files, scanned TIFFs & PDFs for each page • Well structured & labelled Problems: • Hard disk was failing • XML output from Content Management system - incomplete header & missing schema • 30% of files referenced in XML were missing Solution: • Obtained schema & missing files (but took a long, long time)
  • 10. Render data Decode file format Reflect tools & software available at point of creation: • Information content • Contextual information (documentation/metadata) Analyse organisation structure Intrinsic relationships important for decoding multi-file objects • Filenames & directory structure Solution • Specialist software may be required to access • Liaise with data creators https://www.flickr.com/photos/hawksanddoves/83818392 How many locks do you have to get through to reach your destination?
  • 11. Case Study: Scientific dataset USB stick of LSHTM dataset containing: • FCS2.0 - tabular data outlining experiments to count cells, sort them & identify biomarkers • Leica Experiment Collection - .lei library file & associated images with embedded metadata Challenges: • Domain & proprietary formats – FITS (file) provides limited info on .lei – FCS not recognised • Complex relationship in Leica experiment - recorded in filename & internal manifest (partial) Solution • Store files as-is • Obtain text output of FCS files • Analyse using open source tools
  • 12. Understand data • 17th-18th Century Enlightenment built on information sharing • Openness & transparency essential for academic research – Evidence of activity – Open to scrutiny & replication • Can you establish who, what, where, when & how? • How much documentation can only be found in the data creator’s head? https://www.flickr.com/photos/domiriel/5234590796
  • 13. Case Study: Adolphe Appia Warwick Uni. School of Theatre Studies modelled performance space of Appia's Festspielhaus at Hellerau. Collection deposited on several CDs: • Digitised photographs of 1991 performance • VRML 3D models of performance space • Videos of 3D models in .mov format • Documentation & Metadata Problem • Image metadata ‘disappeared’ on transfer Solution: • Descriptions added to file attributes, which were being removed when written to disc • Output file attributes to text file • Compressed files and copied to disk© King's Visualisation Lab, King's College London http://www.kvl.cch.k cl.ac.uk/appia.html
  • 14. Final thoughts 1. Analyse your needs & capabilities – What can you do with existing resources? – What future investment is possible? 2. Inform users of your expectations from the outset – File formats – Documentation – File structure & naming conventions – Permissions 3. Help them to fulfil expectations – Advice and guidance http://www.keepcalm-o-matic.co.uk/p/keep-calm-and-curate-41/

Editor's Notes

  1. investigation & deduction