This document discusses risk management and auditing for digital preservation. It addresses establishing a threat model by understanding what is being preserved and for what purpose. Common threats to digital data include physical medium failure, file format obsolescence, and organizational commitment issues. Audit frameworks like TRAC, DRAMBORA, and SPOT can be used to evaluate repositories, while tools like checksums, migration, and emulation can help mitigate specific risks like bitrot and obsolete formats. Determining file formats and testing file integrity is important for digital preservation.
This presentation is an updated version of my Data Management 101 talk, which covers the basics of research data management in the categories of: storage and backup, documentation, organization, and making files usable for the future.
Slides from NCURA's webinar "Part I: Public Access: Practical Ways To Assist Faculty To Comply With Public Access Policies". This is the last section on the webinar on open data.
Practical Data Management - ACRL DCIG WebinarKristin Briney
Slides from an ACRL DCIG webinar from 30 April 2014 discussing basic data management practices in file organization and naming, documentation, storage and backup, and making files usable in the future.
This presentation is an updated version of my Data Management 101 talk, which covers the basics of research data management in the categories of: storage and backup, documentation, organization, and making files usable for the future.
Slides from NCURA's webinar "Part I: Public Access: Practical Ways To Assist Faculty To Comply With Public Access Policies". This is the last section on the webinar on open data.
Practical Data Management - ACRL DCIG WebinarKristin Briney
Slides from an ACRL DCIG webinar from 30 April 2014 discussing basic data management practices in file organization and naming, documentation, storage and backup, and making files usable in the future.
This presentation provides a few simple strategies to improve your file organization and file naming, which will help you manage your research data better
Talk about Exploring the Semantic Web, and particularly Linked Data, and the Rhizomer approach. Presented August 14th 2012 at the SRI AIC Seminar Series, Menlo Park, CA
This talk was given by Brianna Marshall, Digital Curation Coordinator, at the UW-Madison Digital Humanities Research Network meeting on December 2, 2014.
data management, information management, data, big data, personal organization, organization, file management, scientific research, research, project management, data security, file naming conventions, data management plan,
Information Extraction from Text, presented @ DeloitteDeep Kayal
Useful unstructured text occurs in plentiful amounts, and often is central to the success of a business. The benefits of being able to successfully decipher unstructured text can be direct or derived. Companies which offer products for medical differential diagnosis are directly benefitted by the ability to correctly extract drug-disease interactions from publications, for example. As for derived benefits of text processing, we need to look no further than cases of improving process flows by analyzing the sentiment of the emails a company receives from its customers.
Being at the frontier of natural language processing, information representation and retrieval, information extraction has been the subject of extensive research for several decades and there are plenty of existing techniques to help with the understanding of unstructured textual content. This presentation will introduce and summarize useful techniques that are helpful in tackling sub-domains of information extraction, such as named entity recognition, keyword extraction and document summarization for efficient retrieval. Additionally, the talk will also emphasize low-resource cases, when not much useful labelled information is available.
Archiving Best Practices -- Creative Operations Essentialsglobaledit®
This webinar will help you think strategically about archiving assets at the end of their lifecycle.
What you’ll learn:
- Setting a company archiving strategy
- What, when, how much to archive
- The cost of archiving today
Watch the full webinar here: http://bit.ly/1T1Fpru
This slideshow is part of a "Creative Operations Essentials" webinar series brought to you by Globaledit, the online SaaS platform that empowers creative professionals to manage visual production at scale. You can CAPTURE, REVIEW, APPROVE, MARKUP, SHARE, ARCHIVE your digital files all within Globaledit.
More information about globaledit: www.globaledit.com
This presentation provides a few simple strategies to improve your file organization and file naming, which will help you manage your research data better
Talk about Exploring the Semantic Web, and particularly Linked Data, and the Rhizomer approach. Presented August 14th 2012 at the SRI AIC Seminar Series, Menlo Park, CA
This talk was given by Brianna Marshall, Digital Curation Coordinator, at the UW-Madison Digital Humanities Research Network meeting on December 2, 2014.
data management, information management, data, big data, personal organization, organization, file management, scientific research, research, project management, data security, file naming conventions, data management plan,
Information Extraction from Text, presented @ DeloitteDeep Kayal
Useful unstructured text occurs in plentiful amounts, and often is central to the success of a business. The benefits of being able to successfully decipher unstructured text can be direct or derived. Companies which offer products for medical differential diagnosis are directly benefitted by the ability to correctly extract drug-disease interactions from publications, for example. As for derived benefits of text processing, we need to look no further than cases of improving process flows by analyzing the sentiment of the emails a company receives from its customers.
Being at the frontier of natural language processing, information representation and retrieval, information extraction has been the subject of extensive research for several decades and there are plenty of existing techniques to help with the understanding of unstructured textual content. This presentation will introduce and summarize useful techniques that are helpful in tackling sub-domains of information extraction, such as named entity recognition, keyword extraction and document summarization for efficient retrieval. Additionally, the talk will also emphasize low-resource cases, when not much useful labelled information is available.
Archiving Best Practices -- Creative Operations Essentialsglobaledit®
This webinar will help you think strategically about archiving assets at the end of their lifecycle.
What you’ll learn:
- Setting a company archiving strategy
- What, when, how much to archive
- The cost of archiving today
Watch the full webinar here: http://bit.ly/1T1Fpru
This slideshow is part of a "Creative Operations Essentials" webinar series brought to you by Globaledit, the online SaaS platform that empowers creative professionals to manage visual production at scale. You can CAPTURE, REVIEW, APPROVE, MARKUP, SHARE, ARCHIVE your digital files all within Globaledit.
More information about globaledit: www.globaledit.com
The economies of scaling software - Abdel Remanijaxconf
You spend your precious time building the perfect application. You do everything right. You carefully craft every piece of code and rigorously follow the best practices and design patterns, you apply the most successful methodologies software engineering has to offer with discipline, and you pay attention to the most minuscule of details to produce the best user experience possible. It all pays off eventually, and you end up with a beautiful code base that is not only reliable but also performs well. You proudly watch your baby grow, as new users come in bringing more traffic your way and craving new features. You keep them happy and they keep coming back. One morning, you wake up to servers crashing under load, and data stores failing to keep up with all the demand. You panic. You throw in more hardware and try optimize, but the hungry crowd that was once your happy user base catches up to you. Your success is slipping through your fingers. You find yourself stuck between having to rewrite the whole application and a hard place. It's frustrating, dreadful, and painful to say the least. Don't be that guy! Save your soul before it's too late, and come to learn how to build, deploy, and maintain enterprise-grade Java applications that scale from day one. Topics covered include: parallelism, load distribution, state management, caching, big data, asynchronous processing, and static content delivery. Leveraging cloud computing, scaling teams and DevOps will also be discuss. P.S. This session is more technical than you might think.
Just digitise it by Daniel Wilksch of the Public Records Office Victoria. Presented at the 2016 Community Heritage Grants (CHG) Preservation and Collection Management Training Workshops.
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
We could all have predicted this with our magical Big Data analytics platforms, but it seems that Machine Learning is the new hotness in Information Security. A great number of startups with ‘cy’ and ‘threat’ in their names that claim that their product will defend or detect more effectively than their neighbour's product "because math". And it should be easy to fool people without a PhD or two that math just works.
Indeed, math is powerful and large scale machine learning is an important cornerstone of much of the systems that we use today. However, not all algorithms and techniques are born equal. Machine Learning is a most powerful tool box, but not every tool can be applied to every problem and that’s where the pitfalls lie.
This presentation will describe the different techniques available for data analysis and machine learning for information security, and discuss their strengths and caveats. The Ghost of Marketing Past will also show how similar the unfulfilled promises of deterministic and exploratory analysis were, and how to avoid making the same mistakes again.
Finally, the presentation will describe the techniques and feature sets that were developed by the presenter on the past year as a part of his ongoing research project on the subject, in particular present some interesting results obtained since the last presentation on DefCon 21, and some ideas that could improve the application of machine learning for use in information security, especially in its use as a helper for security analysts in incident detection and response.
You spend your precious time building the perfect application. You do everything right. You carefully craft every piece of code and rigorously follow the best practices and design patterns, you apply the most successful methodologies software engineering has to offer with discipline, and you pay attention to the most minuscule of details to produce the best user experience possible. It all pays off eventually, and you end up with a beautiful code base that is not only reliable but also performs well. You proudly watch your baby grow, as new users come in bringing more traffic your way and craving new features. You keep them happy and they keep coming back. One morning, you wake up to servers crashing under load, and data stores failing to keep up with all the demand. You panic. You throw in more hardware and try optimize, but the hungry crowd that was once your happy user base catches up to you. Your success is slipping through your fingers. You find yourself stuck between having to rewrite the whole application and a hard place. It's frustrating, dreadful, and painful to say the least. Don't be that guy! Save your soul before it's too late, and come to learn how to build, deploy, and maintain enterprise-grade Java applications that scale from day one. Topics covered include: parallelism, load distribution, state management, caching, big data, asynchronous processing, and static content delivery. Leveraging cloud computing, scaling teams and DevOps will also be discuss. P.S. This session is more technical than you might think.
http://jaxconf.com/sessions/economies-scaling-software
From Beer City Code Conference, Grand Rapids, MI - 2017
OWASP, SANS, Threat Modeling, Static Code Analysis, DevSkim, Burp Suite, WireShark, Fiddler, Agile, Use Cases, Code Review, Pull Request, Git, GitFlow, Red Team, Blue Team, Metasploit, NIST, TLS, Kali Linux,
When going into the development of a software product, a possible source of mistake is the incorrect evaluation of the complexity that lies behind an idea , as well as a clutter coming from the massive amounts of technologies enabled. This presentation explains a possible way to deal with such issues.
Lecture for LIS 644 "Digital Trends, Tools, and Debates." Not my strong point, so I won't swear there are no errors. If you reuse, please respect the CC-BY-NC-SA license on the photo.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
2. Threat model
•“Preservation” means nothing unmodified.
• This is why it becomes such a bogeyman!
•Two things you need to know first:
• why you’re preserving what you’re preserving, and
• what you’re preserving it against.
•Libraries: your collection-development policy
should inform the first question.
• Your coll-dev policy doesn’t include local born-digital or
digitized materials? This is a problem. Fix it.
•The second question is your “threat model.”
9. Why did I just make you
do that?
•I’m weird.
•I’m trying to destroy the myth that any given
medium “preserves itself.”
•Media do not preserve themselves. People
preserve media—or media get bizarrely lucky.
•We need not panic over digital preservation
any more than we panic about print.
•Approach digital preservation the same way
you approach print preservation.
19. ?
Ignorance
•“It’s in Google, so it’s preserved.” (Not even
“Google Books!”)
•“I make backups, so I’m fine.”
•“I have a graduate student who takes care of
these things.”
•“Metadata? What’s that? I have to have it?”
•“Digital preservation is an unsolvable problem,
so why even try?” (I’ve heard this one from
librarians. I bet you have too.)
22. Audit frameworks
• Trusted Repository Audit Checklist
• (If you see “NARA/RLG” somewhere? This is the framework that
evolved into TRAC. Long story.)
• You can get an actual formal TRAC audit from CRL! Who has? Portico,
Hathi, “Chronicle of Life,” two-three others. This audit is HARSH. (So
don’t write off a repo because it hasn’t had a TRAC audit.)
• If you hear the phrase “trusted digital repository,” it should mean
that the repo has had (or is pursuing) a TRAC audit.
• DRAMBORA
• More flexible, less finger-shaking than TRAC.
• Less of this “designated community” nonsense.
• Less dependent on OAIS model (which I consider a strength).
• Encourages archives to consider and document their individual
situations and think hard about risk mitigation.
23. Newer: SPOT model
•Even less clunky than DRAMBORA.
•I quite like this one.
•Identifying Threats to Successful Digital
Preservation: the SPOT Model for Risk
Assessment
• http://www.dlib.org/dlib/september12/vermaaten/
09vermaaten.html
24. So what do they audit?
•Mission (and adherence to it)
•Plans and policies
• including contingency plans
•Staff infrastructure
•Operations documentation
• including tech infrastructure, service infrastructure
•Sustainable funding
•“Doing the right things with the stuff.”
• identifiers, ingest file format management, migration, etc.
•NOTICE WHAT’S FIRST ON THE LIST.
• remember, the tech part is the easy part!
25. TRAC, DRAMBORA, and DH
•TRAC, DRAMBORA, and SPOT are designed to
audit repositories, not individual datasets, data
files, or research projects.
• They assume a lot of infrastructure and (in TRAC’s case) a
long-term time horizon that you probably aren’t.
•So if you’re trying to think through a project,
where do you go?
• TRAC and DRAMBORA are probably overkill!
• (Though parts of DRAMBORA won’t hurt you.)
26. Data Curation Profiles
•Research project out of Purdue’s Digital Data
Curation Center (“D2C2”)
•“Toolkit:” interview instrument, user guide for
interview instrument, worksheet.
•Small library of completed profiles
•Ignore the user guide. Grab the worksheet, and
use the interview instrument for reference.
•http://datacurationprofiles.org
• You have to make a login to download the toolkit pieces.
28. Physical medium failure
•Gold CDs are not the panacea we thought.
• They’re not bad; they’re just hard to audit, so they fail
(when they fail) silently. Silent failure is DEADLY.
•Current state of the art: get it on spinning disk.
•Back up often. Distribute your backups
geographically. Test them now and then.
• Consider a LOCKSS cooperative agreement. Others have.
•Bitrot-detection techniques may help here too.
•Any physical medium WILL FAIL. Have a plan
for when it does.
29. “Digital forensics”
•The art and science of investigating digital file
formats and media.
• Reading obsolete ones.
• Reverse-engineering and/or documenting existing ones so
they don’t go obsolete.
• Ensuring secure deletion, when necessary.
• Reconstructing what used to be on a physical storage
medium. (Surprising how often this is possible!)
• Audit trails for legal and records-management purposes.
• AMAZING report (highly highly recommended!): “Digital
Forensics and Born-Digital Content in Cultural Heritage
Institutions.” http://www.clir.org/pubs/abstract/
pub149abst.html. Both computer-nerdy and humanities-
nerdy in the best possible way.
30. Avoiding “bitrot”
•Sometimes used for “file format obsolescence.”
•I use it for “the bits flipped unexpectedly.”
•Checking a file bit-by-bit against a backup copy
is computationally impractical for every day.
• Though on ingest it’s a good idea to verify bit-by-bit!
•Checksums
• A file is, fundamentally, a great big number.
• Do math on the number file. Store the result as metadata.
• To check for bitrot, redo the math and check the answer
against the stored result. If they’re different, scream.
• Several checksum algorithms; for our purposes, which one
you use doesn’t matter much.
• “Hash collision:” it’s possible, but unlikely, for different files
to have the same checksum. Potential hack vector!
31. Migration vs. emulation:
dealing with obsolescence
•Migration
• change the file to be usable in new software/hardware
configurations
• risks: information loss (FONTS!), imperfect transfer,
choosing the wrong migration path
• smart systems don’t throw away the old files!
•Emulation
• keep the file, train new software/hardware to behave like
the old
• risks: imperfect emulation, impractical emulation
• makes more sense for software (games!), less for files
•Pragmatically: redigitization.
32. Finding tools
•Migration
• Current versions of the original software may be able to
open old files.
• Open-source software in the same genre may be able to
translate proprietary file formats (often imperfectly). Tend
to maintain translators longer than you’d think.
• Look on the web!
• MIGRATE FAST. Once it’s damaged or obsolete, it’s
probably too late.
•Emulation
• look for the gamers! it’s WILD what they’ll emulate!
• Look to the open-source community for operating-
system, hardware-driver emulators.
• Frankly, there’s a lot of hype and vaporware here.
33. When is a PDF not a PDF?
•When it’s a .doc with the wrong file extension
•When there’s no file extension on it at all
•When it’s so old it doesn’t follow the
standardized PDF conventions
•When it’s otherwise malformed, made by a
bad piece of software.
•How do you know whether you have a good
PDF? (Or .doc, or .jpg, or .xml, or anything else.)
34. File format registries and
testing tools
•JHOVE: JSTOR/Harvard Object Validation
Environment
• Java software intended to be pluggable into other
software environments
• Answers “What format is this thing?” and “Is this thing a
good example of the format?”
• Limited repertoire of formats
•PRONOM/DROID + GDFR = Unified Digital
Formats Registry
•Wrapper tool: FITS, File Information Tool Set
• JHOVE + DROID + various other testers. State of the art.
35. Thanks!
•Copyright 2011 by Dorothea Salo.
•This lecture and slide deck are licensed under a
Creative Commons Attribution 3.0 United
States License.