A case study of challenges encountered when acquiring and curating digital collections. Presented at the Digital Preservation Coalition workshop on April 23rd, 2015.
Digital Infrastructure: Storage and Content ManagementNoreen Whysel
Discusses analogies between the rise of the electric power grid and the Internet. Describes storage capacity issues and requirements for digital repositories. Reviews different repository platforms specific to archival and digital collection management. Has a really cool picture of Burden's Wheel.
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...WARCnet
Wednesday 6 May: Hand me the data! What you should know as a humanities researcher before asking for data from a web archive, Ulrich Have, NetLab/DIGHUMLAB, Aarhus University
Digital Infrastructure: Storage and Content ManagementNoreen Whysel
Discusses analogies between the rise of the electric power grid and the Internet. Describes storage capacity issues and requirements for digital repositories. Reviews different repository platforms specific to archival and digital collection management. Has a really cool picture of Burden's Wheel.
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...WARCnet
Wednesday 6 May: Hand me the data! What you should know as a humanities researcher before asking for data from a web archive, Ulrich Have, NetLab/DIGHUMLAB, Aarhus University
Making Materials Findable at the State Library of VictoriaAlan Manifold
Discussion of some of the issues involved with the many data sources and repositories in use at the State Library of Victoria, how they interact and some of the solutions we have come up with to resolve them.
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...DuraSpace
“Hot Topics: The DuraSpace Community Webinar Series, " Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 1: “Research Data Curation at UC San Diego: An Overview”
Presented by David Minor & Declan Fleming, Chief Technology Strategist, UC San Diego Library
10-31-13 “Researcher Perspectives of Data Curation” Presentation SlidesDuraSpace
“Hot Topics: The DuraSpace Community Webinar Series, " Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 3: “Researcher Perspectives of Data Curation”
Presented by: David Minor, Research Data Curation Program, UC San Diego Library, Dick Norris, Professor, Scripps Institution of Oceanography & Rick Wagner, Data Scientist, San Diego Supercomputer Center.
Harvesting and semantically tagging media releases from political websites us...Peter Neish
Presented at VALA2012 by Peter Neish on February 9 2012 describing how media releases were automatically harvested from political websites by polling the RSS feeds or relevant sites. Media releases were semantically tagged using the OpenCalais web service.
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813jeffreylancaster
Presentation at the National Meeting of the American Chemical Society in San Francisco, CA, entitled, "Libraries as Hubs for Emerging Technologies" presented on August 13, 2014
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectivePeter Löwe
Digital audiovisual content has become an important communication channel in Science. The TIB|AV-Portal for audiovisual scientific-technical information meets the requirements to preserve such content and to provide innovative services for search and retrieval. Quality checked audiovisual content from Open Source Geoinformatics communities is constantly being acquired for the portal as a part of TIB's mission to preserve relevant content in applied computer sciences for science, industry, and the general public.
An introduction to the Joint Information Systems Committee Resource Discovery iKit. Includes a look at controlled vocabularies declared in the Resource Discovery Framework (RDF)/Simple Knowledge Organisation System (SKOS) and wikipedia entries. Presented by Tony Ross at the CILIPS Centenary Conference Branch and Group Day which took place 5 Jun 2008.
This presentation was provided by Karen A. Wetzel of NISO, Mary Alice Baish of The American Association of Law Libraries (AALL), Keith Johnson of The Stanford Digital Repository, Victoria Reich of Stanford University Libraries, and Carl Grant of ExLibris North America, during the NISO Webinar "Digital Preservation: Current Efforts" held on January 14, 2009
Historical Photographs of China - the journey towards sustainability and utilitySimon Price
Presentation about the University of Bristol's 'Historical Photographs of China' collection at the GW4 Remediating the Archive digital humanities workshop in Cardiff, November 2016. The 'Historical Photographs of China' project began work in 2006 as part of an AHRC funded project on the 'History of the Chinese Maritime Customs Service' into an initiative that locates, digitises, and publishes online photographs of China held, largely, in private hands outside the country. Although some of the 10,000 photographs now online - a quarter of the total - originate from UK institutional repositories, our materials are principally 'crowdsourced' from families living outside China. This presentation introduces the collection and discusses the technical challenges of growing and sustaining free access to this virtual photographic archive of modern China.
Making Materials Findable at the State Library of VictoriaAlan Manifold
Discussion of some of the issues involved with the many data sources and repositories in use at the State Library of Victoria, how they interact and some of the solutions we have come up with to resolve them.
10-1-13 “Research Data Curation at UC San Diego: An Overview” Presentation Sl...DuraSpace
“Hot Topics: The DuraSpace Community Webinar Series, " Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 1: “Research Data Curation at UC San Diego: An Overview”
Presented by David Minor & Declan Fleming, Chief Technology Strategist, UC San Diego Library
10-31-13 “Researcher Perspectives of Data Curation” Presentation SlidesDuraSpace
“Hot Topics: The DuraSpace Community Webinar Series, " Series Six: Research Data in Repositories” Curated by David Minor, Research Data Curation Program, UC San Diego Library. Webinar 3: “Researcher Perspectives of Data Curation”
Presented by: David Minor, Research Data Curation Program, UC San Diego Library, Dick Norris, Professor, Scripps Institution of Oceanography & Rick Wagner, Data Scientist, San Diego Supercomputer Center.
Harvesting and semantically tagging media releases from political websites us...Peter Neish
Presented at VALA2012 by Peter Neish on February 9 2012 describing how media releases were automatically harvested from political websites by polling the RSS feeds or relevant sites. Media releases were semantically tagged using the OpenCalais web service.
ACS National Meeting - Libraries as Hubs for Emerging Technologies - 14_0813jeffreylancaster
Presentation at the National Meeting of the American Chemical Society in San Francisco, CA, entitled, "Libraries as Hubs for Emerging Technologies" presented on August 13, 2014
GIS Day 2015: Geoinformatics, Open Source and Videos - a library perspectivePeter Löwe
Digital audiovisual content has become an important communication channel in Science. The TIB|AV-Portal for audiovisual scientific-technical information meets the requirements to preserve such content and to provide innovative services for search and retrieval. Quality checked audiovisual content from Open Source Geoinformatics communities is constantly being acquired for the portal as a part of TIB's mission to preserve relevant content in applied computer sciences for science, industry, and the general public.
An introduction to the Joint Information Systems Committee Resource Discovery iKit. Includes a look at controlled vocabularies declared in the Resource Discovery Framework (RDF)/Simple Knowledge Organisation System (SKOS) and wikipedia entries. Presented by Tony Ross at the CILIPS Centenary Conference Branch and Group Day which took place 5 Jun 2008.
This presentation was provided by Karen A. Wetzel of NISO, Mary Alice Baish of The American Association of Law Libraries (AALL), Keith Johnson of The Stanford Digital Repository, Victoria Reich of Stanford University Libraries, and Carl Grant of ExLibris North America, during the NISO Webinar "Digital Preservation: Current Efforts" held on January 14, 2009
Historical Photographs of China - the journey towards sustainability and utilitySimon Price
Presentation about the University of Bristol's 'Historical Photographs of China' collection at the GW4 Remediating the Archive digital humanities workshop in Cardiff, November 2016. The 'Historical Photographs of China' project began work in 2006 as part of an AHRC funded project on the 'History of the Chinese Maritime Customs Service' into an initiative that locates, digitises, and publishes online photographs of China held, largely, in private hands outside the country. Although some of the 10,000 photographs now online - a quarter of the total - originate from UK institutional repositories, our materials are principally 'crowdsourced' from families living outside China. This presentation introduces the collection and discusses the technical challenges of growing and sustaining free access to this virtual photographic archive of modern China.
This session covers topics related to data archiving and sharing. This includes data formats, metadata, controlled vocabularies, preservation, archiving and repositories.
3.7.17 DSpace for Data: issues, solutions and challenges Webinar SlidesDuraSpace
Hot Topics: The DuraSpace Community Webinar Series,
“Introducing DSpace 7: Next Generation UI”
Curated by Claire Knowles, Library Digital Development Manager, The University of Edinburgh.
DSpace for Data: issues, solutions and challenges
March 7, 2017 presented by: Claire Knowles & Pauline Ward - The University of Edinburgh & Ryan Scherle - Dryad Digital Repository
Fourth annual BL Labs Symposium, 7 Nov 2016 keynote by Professor Melissa Terras: ‘Unexpected repurposing: The British Library's digital collections and UCL teaching, research and infrastructure’
Managing provenance in the Social Sciences: the Data Documentation Initiative...ARDC
Slides from webinar: Provenance and social science data. Presented on 15 March 2017. Presenter was Dr Steve McEachern, Director Australian Data Archive
FULL webinar recording: https://youtu.be/elPcKqWoOPg
1. Dr Steve McEachern (Director, Aust Data Archive) Data Documentation Initiative (DDI: http://www.ddialliance.org/): A free, international standard for describing data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences. It can document and manage different stages in the research data lifecycle, eg conceptualization, collection, processing, distribution, discovery, and archiving. Documenting data with DDI facilitates understanding, interpretation, and use -- by people, software systems, and computer networks.
Presentation Slides, “Creating Access to Audio & Video Digital Media: The Va...DuraSpace
6-6-12 Presentation Slides, “Creating Access to Audio & Video Digital Media:
The Variations on Video Project & the Rock and Roll Hall of Fame” Presented by: Karen Cariani, Adam Wead, & Jon Dunn
"Filling the Digital Preservation Gap" with ArchivematicaJenny Mitcham
A webinar given by Jenny Mitcham and Simon Wilson to Digital Preservation Coalition members on 25th November 2015. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
RDAP13 John Kunze: The Data Management EcosystemASIS&T
John Kunze, University of California, Curation Center
California Digital Library (CDL)
The Data Management Ecosystem
Panel: Partnerships between institutional repositories, domain repositories, and publishers
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...ASIS&T
Research Data Access and Preservation Summit, 2016
Atlanta, GA
May 4-7, 2016
Poster session (Wednesday, May 4)
Presenters:
Amy Koshoffer, University of Cincinnati
Eric J. Tepe, University of Cincinnati
Similar to Making Sense of a Digital Collection (20)
Presentation slides for a talk on the implications of open science for research managers, discussing how they might support researchers and areas where Africa-based organisations are performing development. It was presented at the West African Research and Innovation Management Association (WARIMA) conference on January 18, 2023, which was held at MRC Gambia at LSHTM Fajara.
Building Sustainability: Preserving research data without breaking the bankGarethKnight
An overview of methods for establishing buy-in into digital preservation activities within a university, accompanied by practical examples of how this approach is being performed at the London School of Hygiene & Tropical Medicine
Complying with EPSRC policy: An LSHTM case studyGarethKnight
Overview of LSHTM's approach to complying with EPSRC data management requirements, focussed on security requirements. Presented at Glasgow University on May 8th 2014
Data Management for Librarians: An IntroductionGarethKnight
Slides from a training session given to librarians on data management. The session was intended to help librarians to consider the challenges associated with maintaining research data and steps that may be taken to address these issues. It was also used to discuss their role in supporting data management activities within LSHTM
Doing research better: The role of meta‐dataGarethKnight
Presentation given by David Leon, Professor of Epidemiology at the London School of Hygiene and Tropical Medicine in January 2012. Subsequently reused at various internal events
Same as it ever was? Significant Properties and the preservation of meaning o...GarethKnight
Presentation describing the methodology adopted by the JISC funded InSPECT project to determine the set of technical properties that are significant for preservation over time
Who Decides? Reinterpreting archival processes for the management of digital ...GarethKnight
Management of digital records can benefit from the contribution of digital curators and archivists. The presentation outlines the efforts of the PEKin project at King's College London to develop a management strategy that combines these disparate skillsets
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Making Sense of a Digital Collection
1. MAKING SENSE OF
A COLLECTION
This work is licensed under a
Creative Commons Attribution 2.0
UK: England & Wales License
Gareth Knight
London School of Hygiene & Tropical Medicine
gareth.knight@lshtm.ac.uk
Getting Started in Digital Preservation
The Information Technologists, London
23rd April 2015
2. Case Studies
National service that preserved
research, teaching and learning
resources in arts & humanities
between 1996 - 2008
Institutional RDM service that
helps LSHTM researchers to
curate & preserved research data
in public health & tropical
medicine
3. Need for Digital Preservation
Data Storage
media
Computing
device
Operating
System
Software
application
Information
+ + + + =
Deteriorate & change
over time
Obsolete & replaced
over time
What does this
mean?
“Digital information lasts forever – or five years, whichever comes first”
Jeff Rothenberg, 1997
4. Climb the preservation mountain
“the series of managed activities necessary to ensure continued
access to digital materials for as long as necessary.”
Neil Beagrie and Maggie Jones (2008)
Beagrie & Jones: http://www.dpconline.org/advice/preservationhandbook/introduction/definitions-and-concepts
Caplan: http://journals.ala.org/ltr/article/view/4224/4809
Modified version of
Caplan’s
Preservation
Pyramid
Content can
be used
Content is
understandable
Content is
rendered accurately
Bits are stored exactly
Its value is recognised & it is acquired
Data exists
5. Digital Detectives
• Digital preservation often a process
of investigation & deduction
• Resource intensive
– Time
– Physical space
– Hardware/software costs
• How much effort are you willing to
make? What is good enough?
https://www.flickr.com/photos/ollieolarte/3028314931
6. Acquire data
Acquisition depends upon object
to be preserved & how stored
• Media: Floppy disk, CD/DVD, ZIP/Jaz disk,
hard disk, solid state devices, etc.
• Electronic: Email, cloud services
Invest in infrastructure to support
preservation process
• Computer hardware
• Media readers
• 3rd party services can provide
advice and hardware rental
where needed
https://www.flickr.com/photos/adactio/13127134455
7. Case Study: AHDS History dataset
Deposited by children of noted researcher in
2006 & processed by GK
Documentation:
Accompanying notes in researcher’s
handwriting described a history DB they were
working on in 1988.
Challenges:
• 5.25" disk drive was available
• Disk was failing, but managed to create a
complete copy on 5th attempt
• Disk analysis revealed text content…
The author's short stories, not a dataset!
Result:
Not accessioned, but children were pleased
http://www.old-computers.com/museum/computer.asp?st=1&c=810
History database created on a Shelton
Instruments Sig-Net, running CP/M
2.2.operating system in 1988 & saved to
5.25” disk
8. Check completeness
What does the creator intend to
provide?
• Data
• Documentation
• Research instruments
What have they actually provided?
• Some data
• Creation software & random files
• Personal music collection?
• Request a file manifest:
– Filename
– Description
– Format
https://www.flickr.com/photos/kyngpao/14455832915
9. Case Study: Early English Books Online
Collection of 125,000 early printed books
deposited for preservation:
• XML files, scanned TIFFs & PDFs for each
page
• Well structured & labelled
Problems:
• Hard disk was failing
• XML output from Content Management
system - incomplete header & missing
schema
• 30% of files referenced in XML were missing
Solution:
• Obtained schema & missing files (but took a
long, long time)
10. Render data
Decode file format
Reflect tools & software available at
point of creation:
• Information content
• Contextual information
(documentation/metadata)
Analyse organisation structure
Intrinsic relationships important for
decoding multi-file objects
• Filenames & directory structure
Solution
• Specialist software may be required to
access
• Liaise with data creators
https://www.flickr.com/photos/hawksanddoves/83818392
How many locks do you have to get
through to reach your destination?
11. Case Study: Scientific dataset
USB stick of LSHTM dataset containing:
• FCS2.0 - tabular data outlining experiments
to count cells, sort them & identify
biomarkers
• Leica Experiment Collection - .lei library file &
associated images with embedded metadata
Challenges:
• Domain & proprietary formats
– FITS (file) provides limited info on .lei
– FCS not recognised
• Complex relationship in Leica experiment -
recorded in filename & internal manifest
(partial) Solution
• Store files as-is
• Obtain text output of FCS files
• Analyse using open source tools
12. Understand data
• 17th-18th Century Enlightenment
built on information sharing
• Openness & transparency essential
for academic research
– Evidence of activity
– Open to scrutiny & replication
• Can you establish who, what, where,
when & how?
• How much documentation can only
be found in the data creator’s head?
https://www.flickr.com/photos/domiriel/5234590796
14. Final thoughts
1. Analyse your needs & capabilities
– What can you do with existing resources?
– What future investment is possible?
2. Inform users of your expectations from
the outset
– File formats
– Documentation
– File structure & naming conventions
– Permissions
3. Help them to fulfil expectations
– Advice and guidance
http://www.keepcalm-o-matic.co.uk/p/keep-calm-and-curate-41/