This document summarizes a presentation on automatically identifying relationships between born-digital records. It discusses using algorithms to detect identical, similar, and linked records by analyzing checksums, hashes, hyperlinks, and other embedded metadata. Eight potential relationships are proposed that could be automatically detected: identical, similar, contains hyperlink, contains CMS reference, contains embedded objects, intra-item relationships, object references, and item mentions. Detecting these relationships programmatically could help address the challenge of manually examining large volumes of digital records.
My slides as part of a workshop run by colleagues at Archives NZ to help other's understand what a checksum is and how it influences our work.
Covers the concept of hashing, multiple algorithms, and collisions. It is aimed at beginners in digital preservation.
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me... Ross Spencer
My time at Archives New Zealand has been my first, truly hands-on experience with born-digital collections. Material transferred in 2008 containing files created over the period of an entire decade has been the focus of my first born-digital ingests with the organisation. The work in the Systems Standards and Strategies team (SSS) at Archives New Zealand has been split into two initial sets of ingests, one set of two followed by another; the idea: to create processes and develop them incrementally. My surprise after the first two ingests back in late November and December 2014, is that five months into the next two, we're still finding challenges - daily! With only the slightest nod to digital preservation and my title as digital preservation analyst, this paper discusses more the difficulties of wrestling core information received from agencies, organizational issues, and the tools available to us in this agency. Organizations and records managers have an opportunity to make recommendations to their users that can ensure issues are minimized when we place records into long-
term preservation, and over the next few years we'll collect plenty of evidence to see the number of surprises reduced, but it is this author's assertion that despite best efforts, we're always going to receive badly behaved digital material for reasons not always foreseen, and that, despite concerted efforts at control, any agency receiving born-digital material must be prepared to understand it, and must also be prepared to manage it through different mitigation strategies - depending on appetite. This paper will introduce the challenges faced while processing the organization’s first born-digital material looking at where the issues arose and why, before concluding that we must learn by doing, and that the collection of evidence and understanding 'real world' scenarios is our best opportunity to reduce surprises even if we can’t reduce them to zero.
The Reality of Digital Transfer @ArchivesNZRoss Spencer
Presentation for Archives New Zealand Records Management Network Event describing the reality of digital transfer. Looking at the potential scale of digital transfers from the largest collections we investigated during the initial transfers project and comparing it to the accession work we're currently investigating at time of writing. A look at some of the challenges involved and how we're tackling those.
ASA Trial Workshop Slides for Archives NZ [2016-09-28]Ross Spencer
A dry-run of content I wanted to present to an Australian Society of Archivists workshop 21 October 2016.
This trial run was at Archives New Zealand on 28 September 2016.
Why do they call it Linked Data when they want to say...?Oscar Corcho
The four Linked Data publishing principles established in 2006 seem to be quite clear and well understood by people inside and outside the core Linked Data and Semantic Web community. However, not only when discussing with outsiders about the goodness of Linked Data but also when reviewing papers for the COLD workshop series, I find myself, in many occasions, going back again to the principles in order to see whether some approach for Web data publication and consumption is actually Linked Data or not. In this talk we will review some of the current approaches that we have for publishing data on the Web, and we will reflect on why it is sometimes so difficult to get into an agreement on what we understand by Linked Data. Furthermore, we will take the opportunity to describe yet another approach that we have been working on recently at the Center for Open Middleware, a joint technology center between Banco Santander and Universidad Politécnica de Madrid, in order to facilitate Linked Data consumption.
An exploration of a possible pipeline for RDF datasets from Timbuctoo instances to the digital archive EASY.
- Get, verify, ingest archive and disseminate (linked) data and metadata.
- What are the implications for an archive: serving linked data over (longer periods of) time
- Practical stuff.
Semantically-Enabled Digital Investigationsinbroker
Semantically-enabled Digital Investigations
Leveraging Semantic Web technologies for representing, integrating, correlating and querying multi-domain digital forensic data
My slides as part of a workshop run by colleagues at Archives NZ to help other's understand what a checksum is and how it influences our work.
Covers the concept of hashing, multiple algorithms, and collisions. It is aimed at beginners in digital preservation.
Time Travelling Analyst: The Things That Only a Time Machine Can Tell Me... Ross Spencer
My time at Archives New Zealand has been my first, truly hands-on experience with born-digital collections. Material transferred in 2008 containing files created over the period of an entire decade has been the focus of my first born-digital ingests with the organisation. The work in the Systems Standards and Strategies team (SSS) at Archives New Zealand has been split into two initial sets of ingests, one set of two followed by another; the idea: to create processes and develop them incrementally. My surprise after the first two ingests back in late November and December 2014, is that five months into the next two, we're still finding challenges - daily! With only the slightest nod to digital preservation and my title as digital preservation analyst, this paper discusses more the difficulties of wrestling core information received from agencies, organizational issues, and the tools available to us in this agency. Organizations and records managers have an opportunity to make recommendations to their users that can ensure issues are minimized when we place records into long-
term preservation, and over the next few years we'll collect plenty of evidence to see the number of surprises reduced, but it is this author's assertion that despite best efforts, we're always going to receive badly behaved digital material for reasons not always foreseen, and that, despite concerted efforts at control, any agency receiving born-digital material must be prepared to understand it, and must also be prepared to manage it through different mitigation strategies - depending on appetite. This paper will introduce the challenges faced while processing the organization’s first born-digital material looking at where the issues arose and why, before concluding that we must learn by doing, and that the collection of evidence and understanding 'real world' scenarios is our best opportunity to reduce surprises even if we can’t reduce them to zero.
The Reality of Digital Transfer @ArchivesNZRoss Spencer
Presentation for Archives New Zealand Records Management Network Event describing the reality of digital transfer. Looking at the potential scale of digital transfers from the largest collections we investigated during the initial transfers project and comparing it to the accession work we're currently investigating at time of writing. A look at some of the challenges involved and how we're tackling those.
ASA Trial Workshop Slides for Archives NZ [2016-09-28]Ross Spencer
A dry-run of content I wanted to present to an Australian Society of Archivists workshop 21 October 2016.
This trial run was at Archives New Zealand on 28 September 2016.
Why do they call it Linked Data when they want to say...?Oscar Corcho
The four Linked Data publishing principles established in 2006 seem to be quite clear and well understood by people inside and outside the core Linked Data and Semantic Web community. However, not only when discussing with outsiders about the goodness of Linked Data but also when reviewing papers for the COLD workshop series, I find myself, in many occasions, going back again to the principles in order to see whether some approach for Web data publication and consumption is actually Linked Data or not. In this talk we will review some of the current approaches that we have for publishing data on the Web, and we will reflect on why it is sometimes so difficult to get into an agreement on what we understand by Linked Data. Furthermore, we will take the opportunity to describe yet another approach that we have been working on recently at the Center for Open Middleware, a joint technology center between Banco Santander and Universidad Politécnica de Madrid, in order to facilitate Linked Data consumption.
An exploration of a possible pipeline for RDF datasets from Timbuctoo instances to the digital archive EASY.
- Get, verify, ingest archive and disseminate (linked) data and metadata.
- What are the implications for an archive: serving linked data over (longer periods of) time
- Practical stuff.
Semantically-Enabled Digital Investigationsinbroker
Semantically-enabled Digital Investigations
Leveraging Semantic Web technologies for representing, integrating, correlating and querying multi-domain digital forensic data
Abby Adams discusses Hagley’s implementation of Preservica and the ongoing adventure of creating policies and workflows for digital accessions and collections.
A collaborative approach to "filling the digital preservation gap" for Resear...Jenny Mitcham
A presentation given by Jenny Mitcham at the Northern Collaboration Conference on 10th September 2015 at Leeds. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
Sailing the Digital Serial Seas: Charting a New Course with CONTENTdmNASIG
The State Library of North Carolina is legally mandated to facilitate public access to publications issued by State agencies and manage the depository system. With the increase of born digital documents and the demand for electronic access, the State Library needed to find a way to support the systematic collection, preservation, and access to state information in digital formats. Focusing on finding repository solutions for digital state publications and based on comparisons among leading products, the library found CONTENTdm to be the best overall fit. With the continuing need to create MARC records for digital documents, CONTENTdm offered functionality to create compound objects for single documents as well as structured serials, providing one permanent URL either way. Working with born digital and digitized serials still presents certain challenges with workflows, providing access, and compensating for the differences between born digital and digitized formats. This presentation discusses the ups and downs of managing digital serials in CONTENTdm, how we do it, and why we do it from the perspective of a mid-size state government library.
Francesca Francis
Assistant State Documents Cataloger, State Library of North Carolina
Raleigh, NC
I assist in the cataloging of original publications created by the state agencies of North Carolina, metadata/class schema/authority creation and management, and catalog problem-solving with a small side of reference desk work at the Government & Heritage Library. Prior to my time at the State Library, I worked part-time on a reference desk in the Cumberland County library system. While living in the DC area, I served as the catalog librarian for the U.S. Census Bureau and worked on a shelf list project with the U.S. GPO. I got my start in the library field when I was selected to work as the cataloging assistant at the law library of Catholic University while earning my MLS. As you may be able to guess, I kind of have a thing for cataloging and providing access to information, whether I'm on deck or in the control room...although I kind of have a penchant for playing the "[wo]man behind the curtain."
Eve Grunberg
Documents Cataloger, State Library of North Carolina
I have been working at the State Library of North Carolina as a documents cataloger since 2006. I am responsible of cataloging everything published by state agencies regardless of the format. Working with differnet publications has given me a great deal of knowledge and experience with MARC cataloging rules and standards, different classification schemas, authority work, Library of Congress and OCLC cataloging tools, metadata standards, and the creation of controlled vocabularies.
The Digital Archaeological Workflow: A Case Study from SwedenMarcus Smith
# The Digital Archaeological Workflow: A Case Study from Sweden
The Digital Archaeological Workflow (DAP) is a programme of work being carried out at the Information Development Unit at the Swedish National Heritage Board, in partnership with the major Swedish archaeological stakeholders. The programme aims to streamline the flow of archaeological data (and its associated metadata) between different actors in the Swedish archaeological process, and to ensure that this data is preserved in a sustainable and accessible manner. It aims to address a number of problems which have hampered the practice of archaeology in Sweden for some time, but which have now started to become more acute as digital technology saturates the processes involved.
There is no centralised register of archaeological fieldwork in Sweden, making it difficult not only to keep track of what is going on where, but also to know what fieldwork – if any – has taken place in connection to a particular site in the national sites and monuments record. Sweden also has no central digital archive for the storage of either archaeological fieldwork data or reports; as such records are now produced digitally, valuable archaeological data is thus increasingly at risk of being lost.
Furthermore, despite the fact that almost all of the data and administrative metadata surrounding archaeological work are digital-born, they are still handled according to analogue paradigms, particularly when information must be shared between different organisations. Sources of archaeological data which are currently made available digitally by various national and local bodies are not typically linked together. This leads to inefficiencies in information transfer, duplication of data and effort, and to information describing the same 'objects' being stored in different systems within different organisations.
The DAP programme intends to address these problems over the course of a five-year period, using standardised platform-agnostic data formats and protocols to streamline information transfer between organisations, by releasing a series of open taxonomies and ontologies for common Swedish archaeological terms and concepts on the semantic web in order to facilitate data interoperability, and by creating a secure digital repository both for the raw data and reports arising from fieldwork and research. We aim to make this information freely available as linked open data.
Our overall mapping of the current Swedish archaeological process is complete (although some details remain) and we are currently working on a conceptual model on which our future information architecture will be based. In parallell, we are also working to translate and release our existing (analogue) archaeological taxonomies to SKOS and release them as linked open data authorities, beginning with the Swedish monuments types thesaurus.
Delivered by Peter Burnhill, Director of EDINA, at the PRELIDA Consolidation and Dissemination workshop on 17/18 October 2014 (http://prelida.eu/consolidation-workshop).
Summary: The web changes over time, and significant reference rot inevitably occurs. Web archiving delivers only a 50% chance of success. So in addition to the original URI, the link should be augmented with temporal context to increase robustness.
IIIF and DSpace 7 - IIIF Conference 2023.pdf4Science
In the last years IIIF became the “de facto” standard for presenting, navigating and delivering digital images on the web all over the world. It defines several APIs for providing a standard method for describing, analysing and sharing images over the web, as well as "presentation-based metadata" about structured sequences of images. However, images and, in particular, cultural heritage images, to be fully analysed, interpreted and enjoyed should be inserted in a “virtual ecosystem” in which they can be related with entities such as people, places, events, fonds, etc., according to different visions and interpretations.
Therefore, since 2017, we have been working at integrating IIIF in a Digital Library environment based on DSpace, the most used Open source Digital Asset Management System, developing a dedicated addon (starting from version 5), easily integrated with a set of external Image Servers, such as Cantaloupe or Digilib, and at extending DSpace data model as well, to structure contextual relationships among cultural heritage entities at different levels.
After DSpace 7 release, we worked with the community at integrating IIIF support in the official DSpace codebase. Now the DSpace REST API implements the IIIF Presentation API version 2.1.1, the IIIF Image API version 2.1.1, and the IIIF Search API version 1.0 (experimental). Any IIIF compliant image server can be integrated. The DSpace Angular frontend uses the Mirador 3.0 viewer.
However, Digital Library requirements are getting complex and complex. Therefore, to fulfil the needs of the cultural heritage domain, we enhanced our solutions based on DSpace 7, developing two further add-ons to integrate and enrich the “IIIF experience” within DSpace: the Document Viewer (for visualizing PDF files within Mirador) and the OCR module (for extracting text from images and indexing it).
Integrating IIIF and DSpace 7 and enriching the platform with new features, it has been possible to go beyond the traditional boundaries of the Digital libraries, structuring a complex system of relationships, building new narratives thanks to interdisciplinarity and the coexistence of different domains.
The proposed 2 hours workshop, addressed to librarians, archivists, historians, archaeologists, researchers and to all those who want to build their own digital library with DSpace 7 and IIIF, will introduce the attendees to the IIIF integration in DSpace both from the backend and from the frontend side.
We will analyze and share our approach and standard workflows for managing cultural heritage documents in DSpace using IIIF, starting with images submission and describing the operations required to make images available to the Mirador Image Viewer, the ones for extracting the text via OCR and for visualizing PDFs through the Image Viewer. Moreover, we will show how to relate items to each other, in order to build a complex system of relationships between entities, to be explored through network graphs.
“Filling the digital preservation gap”an update from the Jisc Research Data ...Jenny Mitcham
Presentation given to the Hydra Preservation Interest Group by Jenny Mitcham on the Jisc Research Data Spring project "Filling the Digital Preservation Gap"
A lecture/conversation focusing on the first 12 years of Semantic Web - delivered on February 21, 2012.
See http://j.mp/SWIntro for more details. More detailed course material is at http://knoesis.org/courses/web3/
A North Carolina Connecting to Collections (C2C) workshop co-taught by Audra Eagle Yun (WFU), Nicholas Graham (UNC), and Lisa Gregory (State Archives of NC). This workshop took place on June 13, 2011 in Wilson, NC.
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data21Style
MuseoTorino, is the first italian project using Web 3.0 tecnologies. NOSQL-GraphDB (Neo4J), RDFa, Linked Open Data.
MuseoTorino is a 21style (www.21-style.com) project for the municipality of Torino, Italy.
These slides come from CodeMotion, the best Italian conference for developers and IT entusiast !
Open data is a crucial prerequisite for inventing and disseminating the innovative practices needed for agricultural development. To be usable, data must not just be open in principle—i.e., covered by licenses that allow re-use. Data must also be published in a technical form that allows it to be integrated into a wide range of applications. The webinar will be of interest to any institution seeking ways to publish and curate data in the Linked Data cloud.
This webinar describes the technical solutions adopted by a widely diverse global network of agricultural research institutes for publishing research results. The talk focuses on AGRIS, a central and widely-used resource linking agricultural datasets for easy consumption, and AgriDrupal, an adaptation of the popular, open-source content management system Drupal optimized for producing and consuming linked datasets.
Agricultural research institutes in developing countries share many of the constraints faced by libraries and other documentation centers, and not just in developing countries: institutions are expected to expose their information on the Web in a re-usable form with shoestring budgets and with technical staff working in local languages and continually lured by higher-paying work in the private sector. Technical solutions must be easy to adopt and freely available.
Working with data is a challenge for many organizations. Nonprofits in particular may need to collect and analyze sensitive, incomplete, and/or biased historical data about people. In this talk, Dr. Cori Faklaris of UNC Charlotte provides an overview of current AI capabilities and weaknesses to consider when integrating current AI technologies into the data workflow. The talk is organized around three takeaways: (1) For better or sometimes worse, AI provides you with “infinite interns.” (2) Give people permission & guardrails to learn what works with these “interns” and what doesn’t. (3) Create a roadmap for adding in more AI to assist nonprofit work, along with strategies for bias mitigation.
RFP for Reno's Community Assistance CenterThis Is Reno
Property appraisals completed in May for downtown Reno’s Community Assistance and Triage Centers (CAC) reveal that repairing the buildings to bring them back into service would cost an estimated $10.1 million—nearly four times the amount previously reported by city staff.
More Related Content
Similar to Binary Trees? Automatically identifying the links between born-digital records
Abby Adams discusses Hagley’s implementation of Preservica and the ongoing adventure of creating policies and workflows for digital accessions and collections.
A collaborative approach to "filling the digital preservation gap" for Resear...Jenny Mitcham
A presentation given by Jenny Mitcham at the Northern Collaboration Conference on 10th September 2015 at Leeds. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
Sailing the Digital Serial Seas: Charting a New Course with CONTENTdmNASIG
The State Library of North Carolina is legally mandated to facilitate public access to publications issued by State agencies and manage the depository system. With the increase of born digital documents and the demand for electronic access, the State Library needed to find a way to support the systematic collection, preservation, and access to state information in digital formats. Focusing on finding repository solutions for digital state publications and based on comparisons among leading products, the library found CONTENTdm to be the best overall fit. With the continuing need to create MARC records for digital documents, CONTENTdm offered functionality to create compound objects for single documents as well as structured serials, providing one permanent URL either way. Working with born digital and digitized serials still presents certain challenges with workflows, providing access, and compensating for the differences between born digital and digitized formats. This presentation discusses the ups and downs of managing digital serials in CONTENTdm, how we do it, and why we do it from the perspective of a mid-size state government library.
Francesca Francis
Assistant State Documents Cataloger, State Library of North Carolina
Raleigh, NC
I assist in the cataloging of original publications created by the state agencies of North Carolina, metadata/class schema/authority creation and management, and catalog problem-solving with a small side of reference desk work at the Government & Heritage Library. Prior to my time at the State Library, I worked part-time on a reference desk in the Cumberland County library system. While living in the DC area, I served as the catalog librarian for the U.S. Census Bureau and worked on a shelf list project with the U.S. GPO. I got my start in the library field when I was selected to work as the cataloging assistant at the law library of Catholic University while earning my MLS. As you may be able to guess, I kind of have a thing for cataloging and providing access to information, whether I'm on deck or in the control room...although I kind of have a penchant for playing the "[wo]man behind the curtain."
Eve Grunberg
Documents Cataloger, State Library of North Carolina
I have been working at the State Library of North Carolina as a documents cataloger since 2006. I am responsible of cataloging everything published by state agencies regardless of the format. Working with differnet publications has given me a great deal of knowledge and experience with MARC cataloging rules and standards, different classification schemas, authority work, Library of Congress and OCLC cataloging tools, metadata standards, and the creation of controlled vocabularies.
The Digital Archaeological Workflow: A Case Study from SwedenMarcus Smith
# The Digital Archaeological Workflow: A Case Study from Sweden
The Digital Archaeological Workflow (DAP) is a programme of work being carried out at the Information Development Unit at the Swedish National Heritage Board, in partnership with the major Swedish archaeological stakeholders. The programme aims to streamline the flow of archaeological data (and its associated metadata) between different actors in the Swedish archaeological process, and to ensure that this data is preserved in a sustainable and accessible manner. It aims to address a number of problems which have hampered the practice of archaeology in Sweden for some time, but which have now started to become more acute as digital technology saturates the processes involved.
There is no centralised register of archaeological fieldwork in Sweden, making it difficult not only to keep track of what is going on where, but also to know what fieldwork – if any – has taken place in connection to a particular site in the national sites and monuments record. Sweden also has no central digital archive for the storage of either archaeological fieldwork data or reports; as such records are now produced digitally, valuable archaeological data is thus increasingly at risk of being lost.
Furthermore, despite the fact that almost all of the data and administrative metadata surrounding archaeological work are digital-born, they are still handled according to analogue paradigms, particularly when information must be shared between different organisations. Sources of archaeological data which are currently made available digitally by various national and local bodies are not typically linked together. This leads to inefficiencies in information transfer, duplication of data and effort, and to information describing the same 'objects' being stored in different systems within different organisations.
The DAP programme intends to address these problems over the course of a five-year period, using standardised platform-agnostic data formats and protocols to streamline information transfer between organisations, by releasing a series of open taxonomies and ontologies for common Swedish archaeological terms and concepts on the semantic web in order to facilitate data interoperability, and by creating a secure digital repository both for the raw data and reports arising from fieldwork and research. We aim to make this information freely available as linked open data.
Our overall mapping of the current Swedish archaeological process is complete (although some details remain) and we are currently working on a conceptual model on which our future information architecture will be based. In parallell, we are also working to translate and release our existing (analogue) archaeological taxonomies to SKOS and release them as linked open data authorities, beginning with the Swedish monuments types thesaurus.
Delivered by Peter Burnhill, Director of EDINA, at the PRELIDA Consolidation and Dissemination workshop on 17/18 October 2014 (http://prelida.eu/consolidation-workshop).
Summary: The web changes over time, and significant reference rot inevitably occurs. Web archiving delivers only a 50% chance of success. So in addition to the original URI, the link should be augmented with temporal context to increase robustness.
IIIF and DSpace 7 - IIIF Conference 2023.pdf4Science
In the last years IIIF became the “de facto” standard for presenting, navigating and delivering digital images on the web all over the world. It defines several APIs for providing a standard method for describing, analysing and sharing images over the web, as well as "presentation-based metadata" about structured sequences of images. However, images and, in particular, cultural heritage images, to be fully analysed, interpreted and enjoyed should be inserted in a “virtual ecosystem” in which they can be related with entities such as people, places, events, fonds, etc., according to different visions and interpretations.
Therefore, since 2017, we have been working at integrating IIIF in a Digital Library environment based on DSpace, the most used Open source Digital Asset Management System, developing a dedicated addon (starting from version 5), easily integrated with a set of external Image Servers, such as Cantaloupe or Digilib, and at extending DSpace data model as well, to structure contextual relationships among cultural heritage entities at different levels.
After DSpace 7 release, we worked with the community at integrating IIIF support in the official DSpace codebase. Now the DSpace REST API implements the IIIF Presentation API version 2.1.1, the IIIF Image API version 2.1.1, and the IIIF Search API version 1.0 (experimental). Any IIIF compliant image server can be integrated. The DSpace Angular frontend uses the Mirador 3.0 viewer.
However, Digital Library requirements are getting complex and complex. Therefore, to fulfil the needs of the cultural heritage domain, we enhanced our solutions based on DSpace 7, developing two further add-ons to integrate and enrich the “IIIF experience” within DSpace: the Document Viewer (for visualizing PDF files within Mirador) and the OCR module (for extracting text from images and indexing it).
Integrating IIIF and DSpace 7 and enriching the platform with new features, it has been possible to go beyond the traditional boundaries of the Digital libraries, structuring a complex system of relationships, building new narratives thanks to interdisciplinarity and the coexistence of different domains.
The proposed 2 hours workshop, addressed to librarians, archivists, historians, archaeologists, researchers and to all those who want to build their own digital library with DSpace 7 and IIIF, will introduce the attendees to the IIIF integration in DSpace both from the backend and from the frontend side.
We will analyze and share our approach and standard workflows for managing cultural heritage documents in DSpace using IIIF, starting with images submission and describing the operations required to make images available to the Mirador Image Viewer, the ones for extracting the text via OCR and for visualizing PDFs through the Image Viewer. Moreover, we will show how to relate items to each other, in order to build a complex system of relationships between entities, to be explored through network graphs.
“Filling the digital preservation gap”an update from the Jisc Research Data ...Jenny Mitcham
Presentation given to the Hydra Preservation Interest Group by Jenny Mitcham on the Jisc Research Data Spring project "Filling the Digital Preservation Gap"
A lecture/conversation focusing on the first 12 years of Semantic Web - delivered on February 21, 2012.
See http://j.mp/SWIntro for more details. More detailed course material is at http://knoesis.org/courses/web3/
A North Carolina Connecting to Collections (C2C) workshop co-taught by Audra Eagle Yun (WFU), Nicholas Graham (UNC), and Lisa Gregory (State Archives of NC). This workshop took place on June 13, 2011 in Wilson, NC.
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data21Style
MuseoTorino, is the first italian project using Web 3.0 tecnologies. NOSQL-GraphDB (Neo4J), RDFa, Linked Open Data.
MuseoTorino is a 21style (www.21-style.com) project for the municipality of Torino, Italy.
These slides come from CodeMotion, the best Italian conference for developers and IT entusiast !
Open data is a crucial prerequisite for inventing and disseminating the innovative practices needed for agricultural development. To be usable, data must not just be open in principle—i.e., covered by licenses that allow re-use. Data must also be published in a technical form that allows it to be integrated into a wide range of applications. The webinar will be of interest to any institution seeking ways to publish and curate data in the Linked Data cloud.
This webinar describes the technical solutions adopted by a widely diverse global network of agricultural research institutes for publishing research results. The talk focuses on AGRIS, a central and widely-used resource linking agricultural datasets for easy consumption, and AgriDrupal, an adaptation of the popular, open-source content management system Drupal optimized for producing and consuming linked datasets.
Agricultural research institutes in developing countries share many of the constraints faced by libraries and other documentation centers, and not just in developing countries: institutions are expected to expose their information on the Web in a re-usable form with shoestring budgets and with technical staff working in local languages and continually lured by higher-paying work in the private sector. Technical solutions must be easy to adopt and freely available.
Similar to Binary Trees? Automatically identifying the links between born-digital records (20)
Working with data is a challenge for many organizations. Nonprofits in particular may need to collect and analyze sensitive, incomplete, and/or biased historical data about people. In this talk, Dr. Cori Faklaris of UNC Charlotte provides an overview of current AI capabilities and weaknesses to consider when integrating current AI technologies into the data workflow. The talk is organized around three takeaways: (1) For better or sometimes worse, AI provides you with “infinite interns.” (2) Give people permission & guardrails to learn what works with these “interns” and what doesn’t. (3) Create a roadmap for adding in more AI to assist nonprofit work, along with strategies for bias mitigation.
RFP for Reno's Community Assistance CenterThis Is Reno
Property appraisals completed in May for downtown Reno’s Community Assistance and Triage Centers (CAC) reveal that repairing the buildings to bring them back into service would cost an estimated $10.1 million—nearly four times the amount previously reported by city staff.
Preliminary findings _OECD field visits to ten regions in the TSI EU mining r...OECDregions
Preliminary findings from OECD field visits for the project: Enhancing EU Mining Regional Ecosystems to Support the Green Transition and Secure Mineral Raw Materials Supply.
Monitoring Health for the SDGs - Global Health Statistics 2024 - WHOChristina Parmionova
The 2024 World Health Statistics edition reviews more than 50 health-related indicators from the Sustainable Development Goals and WHO’s Thirteenth General Programme of Work. It also highlights the findings from the Global health estimates 2021, notably the impact of the COVID-19 pandemic on life expectancy and healthy life expectancy.
Jennifer Schaus and Associates hosts a complimentary webinar series on The FAR in 2024. Join the webinars on Wednesdays and Fridays at noon, eastern.
Recordings are on YouTube and the company website.
https://www.youtube.com/@jenniferschaus/videos
About Potato, The scientific name of the plant is Solanum tuberosum (L).Christina Parmionova
The potato is a starchy root vegetable native to the Americas that is consumed as a staple food in many parts of the world. Potatoes are tubers of the plant Solanum tuberosum, a perennial in the nightshade family Solanaceae. Wild potato species can be found from the southern United States to southern Chile
Synopsis (short abstract) In December 2023, the UN General Assembly proclaimed 30 May as the International Day of Potato.
Donate to charity during this holiday seasonSERUDS INDIA
For people who have money and are philanthropic, there are infinite opportunities to gift a needy person or child a Merry Christmas. Even if you are living on a shoestring budget, you will be surprised at how much you can do.
Donate Us
https://serudsindia.org/how-to-donate-to-charity-during-this-holiday-season/
#charityforchildren, #donateforchildren, #donateclothesforchildren, #donatebooksforchildren, #donatetoysforchildren, #sponsorforchildren, #sponsorclothesforchildren, #sponsorbooksforchildren, #sponsortoysforchildren, #seruds, #kurnool
Jennifer Schaus and Associates hosts a complimentary webinar series on The FAR in 2024. Join the webinars on Wednesdays and Fridays at noon, eastern.
Recordings are on YouTube and the company website.
https://www.youtube.com/@jenniferschaus/videos
Transit-Oriented Development Study Working Group Meeting
Binary Trees? Automatically identifying the links between born-digital records
1. Australian Society of Archivists
Conference 2016, Parramatta
Session 5: Description and Innovation
Binary Trees? Automatically Identifying
the links between born-digital records.
Ross Spencer
Digital Preservation Analyst
Systems Strategy and Standards team
4. Department of Internal Affairs
But that looks like a network graph?!
• It is!
• Records (Items) connected across many recordkeeping and
archival contexts
• Across functions; People; Agency; Subject; Context; References;
Subject... Date, File Format...
• No boundaries!
@ArvhivesNZ:ItemA -> references -> @DOC:ItemB
5. Department of Internal Affairs
We know this...
• Continuum model (Multiple contexts over space and time)
• ICA Draft Conceptual Model (RiC)
• 73 Record Relations RiC-R1 to RiC-R73
• Three of which we might be able to (more easily) automate?
• Has Copy; Is Copy Of; Has Part
• Wherein (I suggest) lies the issue...
6. Department of Internal Affairs
Archives NZ Context
2011 Archives New Zealand developed its new conceptual model and metadata
schema for archival description.
Designed to accommodate description of born-digital records.
much discussion among archivists about the practicalities of describing relationships
between items.
It was acknowledged that, given the volumes of digital records likely to be in each
transfer, neither agency nor Archives staff were likely to examine the content of items
visually one-by-one to determine which other items they referred to...
~ Talei Masters
7. Department of Internal Affairs
What then do we do?
• Mathematical properties of digital files...
• Signals ->
• Numbers ->
• Encoding Schemes (UTF8, ASCII) - >
• Data Structures ->
• File Formats -> User Content.
• Reduce again to a series of numbers that we can interpret to use
numerical properties:
• Greater than; less than; equal to; not equal to...
8. Department of Internal Affairs
In the relationship between numbers we can find the
relationships between records
9. Department of Internal Affairs
Relations we might be able to create...
• Relationship One: Is Identical
• Relationship Two: Is Similar
• Relationship Three: Contains Hyperlink
• Relationship Four: Contains CMS Reference
• Relationship Five: Contains Embedded Digital Objects
• Relationship Six: Contains Intra-Item Relationships
• Relationship Seven: Contains Object References
• Relationship Eight: Item Mentions
10. Department of Internal Affairs
Relationship One: Is Identical
●
We often have checksums available in digital repository
●
First comparison in a digital transfer...
Does Checksum A still equal Checksum A?
●
If yes, accept, continue to transfer...
●
If no... reject! Inspect!
●
Expose this information in the catalogue and compare; what
happens?
11. Department of Internal Affairs
Relationship One: Is Identical
Archival Context A
Record Keeping System
A
Archival Context B
Record Keeping System B
12. Department of Internal Affairs
Relationship Two: Is Similar
• MD5 (Rivest, 1992):
• File A (Zero changes):
8c69dc0668c4c73092a7042df45e756adb170742
• File B (1 Byte Removed):
6b75b8f235c148efd1b03d9c113664895b5aa7cd
13. Department of Internal Affairs
Relationship Two: Is Similar
• SSDEEP (Kornblum, 2006):
• File A (Zero changes):
1536:tLQy16aYRCWYTESSg3yDuBCwclnHpQ/B4k
CK7ZBEY0t5vykp6CYP:q1aYpYTESSgM2CwQGt9Z
BB1U6hP
• File C (First 250 Bytes Removed*):
1536:CLQy16aYRCWYTESSg3yDuBCwclnHpQ/B4k
CK7ZBEY0t5vykp6CYP:B1aYpYTESSgM2CwQGt9
ZBB1U6hP
*Less than two tweets (140 bytes)
14. Department of Internal Affairs
Relationship Two: Is Similar
• First experiments, SSDEEP (Kornblum), TLSH* (Oliver et al.)
• Oliver et al. (2014) Thresholds should be tuned for each application
• Fiirst application is item level sentencing during transfer feasibility
investigations
• Manually sentence... 10 records per hour
• Follow links to those not of archival value...
* Trend Micro Locality Sensitivity Hash!
18. Department of Internal Affairs
Relationship Two: Is Similar
You liked this record... you might also like...
19. Department of Internal Affairs
Relationship Three: Contains HTTP://
• Burnhill et al. (2015)
• 64,000 e-theses, 46,000 pointed out to external sources
• Websites, external files, etc.
20. Department of Internal Affairs
Relationship Three: Contains HTTP://
#!/bin/bash
set -e
#FILES LOCATION
FILES='/home/digital-preservation/accessions'
dp_analysis ()
{
echo -e $(catdoc "$file" | grep "http://") | tr -d '[:cntrl:]'
echo
}
# Find loop...
oIFS=$IFS
IFS=$'n'
time(find "$FILES" -type f | while read -r file; do
dp_analysis "$file"
done)
IFS=$oIFS
21. Department of Internal Affairs
Relationship Three: Contains HTTP://
• https://gist.github.com/ross-
spencer/a6411a021afb7de7e3dc6dd713f7b520
• ~5059 parseable born-digital records
• ~4800 lines contained hyperlinks
22. Department of Internal Affairs
Relationship Four: Contains CMS Reference
echo -e $(catdoc "$file" | grep -e "A[0-9]{6}"
• Matches the Archway catalogue reference number, e.g.
A204050; A123456; and not AZ12345
• CMS reference could be sent alongside transfer metadata
for such searches.
• Flag existence (at least) - FYI to the end user – be that the
transfer archivist, to the agency, to the researcher
23. Department of Internal Affairs
Relationship Five: Contains Embedded Object
$ java -jar tika-app1.13.jar -z <filename> --extract-
dir=<dirname>
25. Department of Internal Affairs
Relationship Seven: Contains Object Reference
A digital preservation risk...
26. Department of Internal Affairs
Relationship Seven: Contains Object Reference
Extract files from PPT OLE2 -> Read PowerPoint Document Obect ->
Look for:
27. Department of Internal Affairs
Relationship Eight: Item Mentions
Dictionary:
Helen Clark
Helen Elizabeth Clark
John Key
United Nations
Prime Minister
University of Auckland
Jenny Shipley
Labour Party
29. Department of Internal Affairs
Discussion
• Data structures – support needed in catalogue, and digital
preservation system...
• Extensbile, flexible enough not to (need to) know what the
future holds...
• AS/NZS 5478:2015, Recordkeeping metadata property
reference set (RMPRS) states:
“The digital world is increasingly using networked
relationships”.
30. Department of Internal Affairs
Discussion
• Verhoeven (2016) – Devil’s Bridges!
– Ontological, graph/network based infrastuctures
– Vernacular ontologies
– Understand, Make, Improve Quality of our Connections
– redistribution of power and the possibilities of world
making (and remaking) in the archive
31. Department of Internal Affairs
Providing the algorithms are transparent, what then
provides a more objective view of the world than machine
generated relations?
32. Department of Internal Affairs
Discussion
• ICA... RiC-R7: ‘is Draft Of’ semantics (A Speech):
– Still a draft if 80% content is different from published?
– Draft because it’s marked as such in metadata?
– Draft when it has been delivered in the wild?
• ICA... RiC-R4: ‘has Subject’ semantics (This Presentation):
– Graph technologies?
– Digtial preservation?
– Processing of digital archives?
– Binary trees?!
33. Department of Internal Affairs
“RiC-CM aspires to reflect both facets of the Principle of Provenance, as
these have traditionally been understood and practiced, and at the same
time recognize a more expansive and dynamic understanding of
provenance. It is this more expansive understanding that is embodied in
the word “Contexts.” RiC-CM is intended to enable a fuller, if forever
incomplete, description of the contexts in which records emerge and exist,
so as to enable multiple perspectives and multiple avenues of access.”
34. Department of Internal Affairs
Discussion
• Impact for record keeping; transfer; digital preservtion,
discovery...
• Digital preservation – linked objects, hyperlinks, embedded
objects...
• Not all geekery!
• Remember the content of these records...
• Remember the connections...
• Remember use-cases for digital preservation, it does not
operate, in and of itself!
35. Department of Internal Affairs
Conclusion
• “Computer forensic examiners are often overwhelmed with
data. Modern hard drives contain more information that
cannot be manually examined in a reasonable time period
creating a need for data reduction techniques.” - Kornblum
(2006)
• So how do we begin?
One relation at a time...
36. Department of Internal Affairs
Links
• Checksum 101: http://www.slideshare.net/RossSpencer/checksum-101
• SSCOMPARE: https://github.com/exponential-decay/sscompare
• TLSH Experiments: https://github.com/exponential-decay/tlsh-experiments
• Parrallel Lines Workshop: https://github.com/andreakb/parallel-lines-workshop
• Apache Tika: https://tika.apache.org/
• Full Paper: Hopefully in Archives and Manuscripts sometime soon!