A dry-run of content I wanted to present to an Australian Society of Archivists workshop 21 October 2016.
This trial run was at Archives New Zealand on 28 September 2016.
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me... Ross Spencer
My time at Archives New Zealand has been my first, truly hands-on experience with born-digital collections. Material transferred in 2008 containing files created over the period of an entire decade has been the focus of my first born-digital ingests with the organisation. The work in the Systems Standards and Strategies team (SSS) at Archives New Zealand has been split into two initial sets of ingests, one set of two followed by another; the idea: to create processes and develop them incrementally. My surprise after the first two ingests back in late November and December 2014, is that five months into the next two, we're still finding challenges - daily! With only the slightest nod to digital preservation and my title as digital preservation analyst, this paper discusses more the difficulties of wrestling core information received from agencies, organizational issues, and the tools available to us in this agency. Organizations and records managers have an opportunity to make recommendations to their users that can ensure issues are minimized when we place records into long-
term preservation, and over the next few years we'll collect plenty of evidence to see the number of surprises reduced, but it is this author's assertion that despite best efforts, we're always going to receive badly behaved digital material for reasons not always foreseen, and that, despite concerted efforts at control, any agency receiving born-digital material must be prepared to understand it, and must also be prepared to manage it through different mitigation strategies - depending on appetite. This paper will introduce the challenges faced while processing the organization’s first born-digital material looking at where the issues arose and why, before concluding that we must learn by doing, and that the collection of evidence and understanding 'real world' scenarios is our best opportunity to reduce surprises even if we can’t reduce them to zero.
My slides as part of a workshop run by colleagues at Archives NZ to help other's understand what a checksum is and how it influences our work.
Covers the concept of hashing, multiple algorithms, and collisions. It is aimed at beginners in digital preservation.
The Reality of Digital Transfer @ArchivesNZRoss Spencer
Presentation for Archives New Zealand Records Management Network Event describing the reality of digital transfer. Looking at the potential scale of digital transfers from the largest collections we investigated during the initial transfers project and comparing it to the accession work we're currently investigating at time of writing. A look at some of the challenges involved and how we're tackling those.
A presentation by Gill Hamilton, Digital Access Manager at the National Library of Scotland (NLS).
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me... Ross Spencer
My time at Archives New Zealand has been my first, truly hands-on experience with born-digital collections. Material transferred in 2008 containing files created over the period of an entire decade has been the focus of my first born-digital ingests with the organisation. The work in the Systems Standards and Strategies team (SSS) at Archives New Zealand has been split into two initial sets of ingests, one set of two followed by another; the idea: to create processes and develop them incrementally. My surprise after the first two ingests back in late November and December 2014, is that five months into the next two, we're still finding challenges - daily! With only the slightest nod to digital preservation and my title as digital preservation analyst, this paper discusses more the difficulties of wrestling core information received from agencies, organizational issues, and the tools available to us in this agency. Organizations and records managers have an opportunity to make recommendations to their users that can ensure issues are minimized when we place records into long-
term preservation, and over the next few years we'll collect plenty of evidence to see the number of surprises reduced, but it is this author's assertion that despite best efforts, we're always going to receive badly behaved digital material for reasons not always foreseen, and that, despite concerted efforts at control, any agency receiving born-digital material must be prepared to understand it, and must also be prepared to manage it through different mitigation strategies - depending on appetite. This paper will introduce the challenges faced while processing the organization’s first born-digital material looking at where the issues arose and why, before concluding that we must learn by doing, and that the collection of evidence and understanding 'real world' scenarios is our best opportunity to reduce surprises even if we can’t reduce them to zero.
My slides as part of a workshop run by colleagues at Archives NZ to help other's understand what a checksum is and how it influences our work.
Covers the concept of hashing, multiple algorithms, and collisions. It is aimed at beginners in digital preservation.
The Reality of Digital Transfer @ArchivesNZRoss Spencer
Presentation for Archives New Zealand Records Management Network Event describing the reality of digital transfer. Looking at the potential scale of digital transfers from the largest collections we investigated during the initial transfers project and comparing it to the accession work we're currently investigating at time of writing. A look at some of the challenges involved and how we're tackling those.
A presentation by Gill Hamilton, Digital Access Manager at the National Library of Scotland (NLS).
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
A presentation by Daniel Lewis of the Open Knowledge Foundation.
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
A presentation by Susanne Thorbord, Bibliographic Consultant at the Danish Bibliographic Centre (DBC).
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
Once you start working with distributed Big Data systems, you start discovering a whole bunch of problems you won’t find in monolithic systems.
All of a sudden to monitor all of the components becomes a big data problem itself.
In the talk we’ll mention all of the aspects that you should take in consideration when monitoring a distributed system once you’re using tools like:
Web Services, Apache Spark, Cassandra, MongoDB, Amazon Web Services.
Not only the tools, what should you monitor about the actual data that flows in the system?
And we’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.
Demi Ben-Ari is a Co-Founder and CTO @ Panorays.
Demi has over 9 years of experience in building various systems both from the field of near real time applications and Big Data distributed systems.
Describing himself as a software development groupie, Interested in tackling cutting edge technologies.
Demi is also a co-founder of the “Big Things” Big Data community: http://somebigthings.com/big-things-intro/
This presentation was given to a group of SFS students at GW. It's designed to be semi-case study driven on the problems I've encountered on assessments and how programming can help solve them.
Behind the Scenes at Coolblue - Feb 2017Pat Hermens
In this talk, Pat stepped us through how we integrate with the #elasticstack here at Coolblue, using tooling like #Log4Net, #Serilog, #Seq and #Redis. Along the way, we were introduced to the role of each of these technologies, and as an added bonus, Pat demo'd how we can set some of these tools up in Docker containers in order to aid our rapid development and testing feedback cycles.
Blockchain and smart contracts, what they are and why you should really care ...maeste
After a brief introduction on what is blockchain technology and how it works under the wood, focusing on Ethereum the next generation blockchain implementation. We will focus on the concept of smart contract introducing it through a simple case study and its standard implementation in ethereum. We will code it using Solidity language deploying and testing it in a live demo on Ethereum test network.
D.3.1: State of the Art - Linked Data and Digital PreservationPRELIDA Project
by D. Giaretta (APARSEN), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
This talk will try to take you into thinking about your technical reasoning for scaling on the first 18 months of your startup, some things are hard to get right and we hope you learn from our experience!
A talk on how to think about choosing a distributed messaging technology, and some notes on how to avoid locking yourself into a single choice, keeping your platform able to grow as needs change.
A presentation by Daniel Lewis of the Open Knowledge Foundation.
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
A presentation by Susanne Thorbord, Bibliographic Consultant at the Danish Bibliographic Centre (DBC).
Delivered at the Cataloguing and Indexing Group Scotland (CIGS) Linked Open Data (LOD) Conference which took place Fri 21 September 2012 at the Edinburgh Centre for Carbon Innovation.
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
Once you start working with distributed Big Data systems, you start discovering a whole bunch of problems you won’t find in monolithic systems.
All of a sudden to monitor all of the components becomes a big data problem itself.
In the talk we’ll mention all of the aspects that you should take in consideration when monitoring a distributed system once you’re using tools like:
Web Services, Apache Spark, Cassandra, MongoDB, Amazon Web Services.
Not only the tools, what should you monitor about the actual data that flows in the system?
And we’ll cover the simplest solution with your day to day open source tools, the surprising thing, that it comes not from an Ops Guy.
Demi Ben-Ari is a Co-Founder and CTO @ Panorays.
Demi has over 9 years of experience in building various systems both from the field of near real time applications and Big Data distributed systems.
Describing himself as a software development groupie, Interested in tackling cutting edge technologies.
Demi is also a co-founder of the “Big Things” Big Data community: http://somebigthings.com/big-things-intro/
This presentation was given to a group of SFS students at GW. It's designed to be semi-case study driven on the problems I've encountered on assessments and how programming can help solve them.
Behind the Scenes at Coolblue - Feb 2017Pat Hermens
In this talk, Pat stepped us through how we integrate with the #elasticstack here at Coolblue, using tooling like #Log4Net, #Serilog, #Seq and #Redis. Along the way, we were introduced to the role of each of these technologies, and as an added bonus, Pat demo'd how we can set some of these tools up in Docker containers in order to aid our rapid development and testing feedback cycles.
Blockchain and smart contracts, what they are and why you should really care ...maeste
After a brief introduction on what is blockchain technology and how it works under the wood, focusing on Ethereum the next generation blockchain implementation. We will focus on the concept of smart contract introducing it through a simple case study and its standard implementation in ethereum. We will code it using Solidity language deploying and testing it in a live demo on Ethereum test network.
D.3.1: State of the Art - Linked Data and Digital PreservationPRELIDA Project
by D. Giaretta (APARSEN), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
This talk will try to take you into thinking about your technical reasoning for scaling on the first 18 months of your startup, some things are hard to get right and we hope you learn from our experience!
A talk on how to think about choosing a distributed messaging technology, and some notes on how to avoid locking yourself into a single choice, keeping your platform able to grow as needs change.
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
Note: Contact me directly dag@bioteam.net if you would like a PDF download of these slides
This is Chris Dagdigian’s 10th year delivering his no holds barred, candid state of the industry address at BioIT World, and we are not going to let a pandemic stop him.
Instead of his typical talk, five distinguished panelists will join Chris for a spirited discussion on Current Events and Scientific Computing and the impacts of the COVID-19 Pandemic:
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...DynamicInfraDays
Slides from Jeff Mitchell's talk "Hiding in Plain Sight: Managing Secrets in a Container Environment" at ContainerDays Boston 2016: http://dynamicinfradays.org/events/2016-boston/programme.html#secrets
Archives work is messy -- in many cases archivists have to organize and make accessible large amounts of mixed data in a variety of formats, both physical and digital. Thankfully, there are a variety of technology tools available to help solve the messiness problem and make collections more accessible. In this session, audience members will learn about current and emerging archival technology tools, the pros and cons of the major tools, and resources for further education.
Similar to ASA Trial Workshop Slides for Archives NZ [2016-09-28] (20)
Jennifer Schaus and Associates hosts a complimentary webinar series on The FAR in 2024. Join the webinars on Wednesdays and Fridays at noon, eastern.
Recordings are on YouTube and the company website.
https://www.youtube.com/@jenniferschaus/videos
Donate to charity during this holiday seasonSERUDS INDIA
For people who have money and are philanthropic, there are infinite opportunities to gift a needy person or child a Merry Christmas. Even if you are living on a shoestring budget, you will be surprised at how much you can do.
Donate Us
https://serudsindia.org/how-to-donate-to-charity-during-this-holiday-season/
#charityforchildren, #donateforchildren, #donateclothesforchildren, #donatebooksforchildren, #donatetoysforchildren, #sponsorforchildren, #sponsorclothesforchildren, #sponsorbooksforchildren, #sponsortoysforchildren, #seruds, #kurnool
RFP for Reno's Community Assistance CenterThis Is Reno
Property appraisals completed in May for downtown Reno’s Community Assistance and Triage Centers (CAC) reveal that repairing the buildings to bring them back into service would cost an estimated $10.1 million—nearly four times the amount previously reported by city staff.
Preliminary findings _OECD field visits to ten regions in the TSI EU mining r...OECDregions
Preliminary findings from OECD field visits for the project: Enhancing EU Mining Regional Ecosystems to Support the Green Transition and Secure Mineral Raw Materials Supply.
About Potato, The scientific name of the plant is Solanum tuberosum (L).Christina Parmionova
The potato is a starchy root vegetable native to the Americas that is consumed as a staple food in many parts of the world. Potatoes are tubers of the plant Solanum tuberosum, a perennial in the nightshade family Solanaceae. Wild potato species can be found from the southern United States to southern Chile
Synopsis (short abstract) In December 2023, the UN General Assembly proclaimed 30 May as the International Day of Potato.
Monitoring Health for the SDGs - Global Health Statistics 2024 - WHOChristina Parmionova
The 2024 World Health Statistics edition reviews more than 50 health-related indicators from the Sustainable Development Goals and WHO’s Thirteenth General Programme of Work. It also highlights the findings from the Global health estimates 2021, notably the impact of the COVID-19 pandemic on life expectancy and healthy life expectancy.
Working with data is a challenge for many organizations. Nonprofits in particular may need to collect and analyze sensitive, incomplete, and/or biased historical data about people. In this talk, Dr. Cori Faklaris of UNC Charlotte provides an overview of current AI capabilities and weaknesses to consider when integrating current AI technologies into the data workflow. The talk is organized around three takeaways: (1) For better or sometimes worse, AI provides you with “infinite interns.” (2) Give people permission & guardrails to learn what works with these “interns” and what doesn’t. (3) Create a roadmap for adding in more AI to assist nonprofit work, along with strategies for bias mitigation.
Jennifer Schaus and Associates hosts a complimentary webinar series on The FAR in 2024. Join the webinars on Wednesdays and Fridays at noon, eastern.
Recordings are on YouTube and the company website.
https://www.youtube.com/@jenniferschaus/videos
4. 2014-06-20: Play It Again Conference Report:
http://bit.ly/2d8Bnw0
(playitagain.org)
2014-11-25: The Reality of Digital Transfer:
http://bit.ly/2ctxocQ
(slideshare.net)
5. We (Archives NZ) have got quite far… But
there's still a lot more to do…
6. So let's remind ourselves: What is the point?
● Work in concert with agencies and their consultants.
● Generate better information and records management
● Cleaner transfers...
● Create a more open and transparent government where the digital record is
concerned...
● DIA’s line... Support New Zealanders to build strong communities by providing
access to trusted information and knowledge.
7. And! Digital Preservation
● At this point in time, idiomatic methods of preservation are still forming...
● Whatever the future of archival custodianship...
● Or the future of digital preservation...
● Techniques need to be developed to support agencies with information and records
management, and memory institutes with long-term custodianship.
● Don't fall into the processing trap...
8. What can we identify as important?
● Infrastructure/team, supported by the organisation
● Some things work, some don’t; some change... be flexible.
● Work iteratively...
● Look at what you can do...
● Continue to develop... evidence, real use-cases
11. Policy...
●Has been a constant in my time here.
●Was a draw to me starting in NZ
●Sets the rules by which we can play…
●Literally, play: bend don’t break
● Achieved through careful stakeholder consultation and consideration of
impact.
●Sign-off process at director level.
●Two favourite policies, checksum, pre-conditioning.
12. Team...
●We could always do with more people…
●But we recognise that we've been allowed more folk dedicated to this
than some places.
●The team is supported in their decision making and their skills.
●Breakdown: Curious; driven; up-to-date; drive to ‘solve’ born-digital
transfer; different but complementary skills… *passion*!
●(And opinionated! ;-) )
●It doesn’t always look that way but there is a certain amount of leeway
from IT support too...
13. Technology...?
Rosetta by Ex-Libris: is the Long-term preservation system, it allows us to manage some
quite complex bits 'n' pieces… but:
●Does not yet enable transfer from Agency-to-Archives (it supports)
●Is not a clearing house for records
●Spot preservation risks up-front
●Doesn't 'do' sentencing…
●Does not build ingest packages…
●Does not 'do' archival description...
●Does not contain every tool under the sun to handle all the file formats…
Machine Learning: http://nautil.us/blog/the-fundamental-limits-of-machine-learning
14. The processes we need are biased toward transfer
and ingest…
Rosetta can only help so much…
||----------------||---------------------------------------------------------------------------------------------------||
Creation Transfer (Life of a record ~25 years) Life of an archive ~∞
The other processes we will still need will be
about (active) long term custodianship…
Rosetta is still only beginning that journey...
15. The miscellany in this presentation...
A story about the tools that can help us...
● Technical Registries (of practice)
● DROID/Siegfried Analysis Report
● Fuzzy Hashes
16.
17.
18. With everything we need to do…
We cannot action it all at the same time...
19. Knowledge needs to remain alive and accessible, record it:
Source: https://commons.wikimedia.org/wiki/Category:Kanban#/media/File:Simple_Task_Kanban.jpg
22. DROID/Siegfried Analysis Report
● Example of changing needs and capability
● Initially a plain-text reporting tool
● Evolved into a 'team' tool…
● Evolving into an organisation’s tool…
● Hopefully a community tool…
● Our first port of call for any transfer...
* Marriage of DROID and Siegfried: http://bit.ly/2ddS0IP
* A little bit more about the tool: http://bit.ly/2dii3jP
23. DROID/Siegfried Analysis Report
● Available to all the community (December 2013): http://bit.ly/2cB8gFY
● Maps DROID and Siegfried output to an SQLite database for querying power and speed.
● Aside from Python, ZERO-dependencies – user needs to be able to download it and go...
● Complete flexibility over output.
● TXT, HTML, Rogues, Heroes… Normalization via database layer – write your own!
● Normalization via database layer – abstracted for multiple ID tools
● The tools each do what they're supposed to well, the dissection of output can be left to others.
* Marriage of DROID and Siegfried (OPF Blog): http://bit.ly/2ddS0IP
* A little bit more about the tool (OPF Blog): http://bit.ly/2dii3jP
28. Benefits...
● Sets a baseline for a lingua franca… beginners and experts
alike...
● Definitions contributed by our archivists!
● Easier on the eye
● Re-factored to be more flexible
● Give it a try! Let us know how it goes!
31. Checksums
● Looking to be unique
– De-duplication
– Fixity
● No connection between
– Security function
– Cannot reverse
32. But every file has a connection...
● Binary
● File Format
● Textual Content
● Embedded Content
● Template
● Author
● Like DNA, with many different strands to dissect...
● Fuzzy Hashing!
35. And they look like...
● aad371039d588b43e02887f87e570f6d2b1a7f1da89667ef11227d
9b3e706610d8e12d
● 0dc36013dd088b43e02983f87e534e6d2b1a7f1da88627ef11267d
8b3e716610d9e16d
● Not that different from regular checksums!
● But help us to demonstrate a closer relationship between files…
● “The sum of the parts is greater than the whole.”
~ Arist!otle
40. How can we use this?
● Sentencing... while still teaching our machines, we can still close
the net while looking at records manually…
● Discovery: Amazon like results: You might also like this record!
41. The experiment continues...
● Matches are relative to themselves...
● Algorithms make a difference...
● And perhaps, like genetics... some traits are more dominant than
others...
● Consider working with content in different ways...
– Utilize format bias... normalize
– Separate content from structure and analyse?
● Keep trying things, but at minimum cost... (another agile concept:
minimal viable product)
42.
43. Conclusion: A bit more miscellany
●Keyword: Interim
●Our needs change constantly, and there's a lot to do…
●Don't suffer paralysis by analysis.
●Do a requirements analysis
●Look at what you can do (minimum viable product) and iterate...
44. Conclusion: A bit more miscellany
●Lot's of hints to bits 'n' pieces I haven't been able to talk about:
●Role of the community… (They/We're here to help! Same problems!)
●Communication and sharing… (Do it!)
●Software development skills… (There are other ways to be involved)
What's the point? (OPF Blog): http://bit.ly/2ddXnaY
●Maybe also a seed for discussion.