This document provides an overview of archival technologies presented at the 46th Annual Georgia Archives Institute on June 10-21, 2013. The presentation introduces various archival management tools like Archon and Archivists' Toolkit for managing archival collections. It also discusses digital collection management software such as CONTENTdm and Islandora. Emerging standards, formats and linked open data initiatives are also covered. The goal is to help archivists identify existing and new technologies that can help manage and provide access to archival materials.
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightDatabricks
Machine learning pipelines are a hot topic at the moment. Moving data through the pipeline in an efficient and predictable way is one of the most important aspects of running machine learning models in production.
This workshop presentation from Enterprise Knowledge team members Joe Hilger, Founder and COO, and Sara Nash, Technical Analyst, was delivered on June 8, 2020 as part of the Data Summit 2020 virtual conference. The 3-hour workshop provided an interdisciplinary group of participants with a definition of what a knowledge graph is, how it is implemented, and how it can be used to increase the value of your organization’s datas. This slide deck gives an overview of the KM concepts that are necessary for the implementation of knowledge graphs as a foundation for Enterprise Artificial Intelligence (AI). Hilger and Nash also outlined four use cases for knowledge graphs, including recommendation engines and natural language query on structured data.
Frame - Feature Management for Productive Machine LearningDavid Stein
Presented at the ML Platforms Meetup at Pinterest HQ in San Francisco on August 16, 2018.
Abstract: At LinkedIn we observed that much of the complexity in our machine learning applications was in their feature preparation workflows. To address this problem, we built Frame, a shared virtual feature store that provides a unified abstraction layer for accessing features by name. Frame removes the need for feature consumers to deal directly with underlying data sources, which are often different across computing environments. By simplifying feature preparation, Frame has made ML applications at LinkedIn easier to build, modify, and understand.
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
Many of the Big Data and IoT use cases are based on combining data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightDatabricks
Machine learning pipelines are a hot topic at the moment. Moving data through the pipeline in an efficient and predictable way is one of the most important aspects of running machine learning models in production.
This workshop presentation from Enterprise Knowledge team members Joe Hilger, Founder and COO, and Sara Nash, Technical Analyst, was delivered on June 8, 2020 as part of the Data Summit 2020 virtual conference. The 3-hour workshop provided an interdisciplinary group of participants with a definition of what a knowledge graph is, how it is implemented, and how it can be used to increase the value of your organization’s datas. This slide deck gives an overview of the KM concepts that are necessary for the implementation of knowledge graphs as a foundation for Enterprise Artificial Intelligence (AI). Hilger and Nash also outlined four use cases for knowledge graphs, including recommendation engines and natural language query on structured data.
Frame - Feature Management for Productive Machine LearningDavid Stein
Presented at the ML Platforms Meetup at Pinterest HQ in San Francisco on August 16, 2018.
Abstract: At LinkedIn we observed that much of the complexity in our machine learning applications was in their feature preparation workflows. To address this problem, we built Frame, a shared virtual feature store that provides a unified abstraction layer for accessing features by name. Frame removes the need for feature consumers to deal directly with underlying data sources, which are often different across computing environments. By simplifying feature preparation, Frame has made ML applications at LinkedIn easier to build, modify, and understand.
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaGuido Schmutz
Many of the Big Data and IoT use cases are based on combining data from multiple data sources and to make them available on a Big Data platform for analysis. The data sources are often very heterogeneous, from simple files, databases to high-volume event streams from sensors (IoT devices). It’s important to retrieve this data in a secure and reliable manner and integrate it with the Big Data platform so that it is available for analysis in real-time (stream processing) as well as in batch (typical big data processing). In past some new tools have emerged, which are especially capable of handling the process of integrating data from outside, often called Data Ingestion. From an outside perspective, they are very similar to a traditional Enterprise Service Bus infrastructures, which in larger organization are often in use to handle message-driven and service-oriented systems. But there are also important differences, they are typically easier to scale in a horizontal fashion, offer a more distributed setup, are capable of handling high-volumes of data/messages, provide a very detailed monitoring on message level and integrate very well with the Hadoop ecosystem. This session will present and compare Apache Flume, Apache NiFi, StreamSets and the Kafka Ecosystem and show how they handle the data ingestion in a Big Data solution architecture.
Deep dive into LangChain integration with Neo4j.pptxTomazBratanic1
Deep dive into LangChain integrations with Neo4j. Learn how to query your graph with LangChain either by generating Cypher statements using LLMs or using the vector index.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. Many ML Platforms cover data collection, feature engineering, training, deploying, productionalization, and monitoring but few, if any, do all of the above seamlessly.
Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python and Spark and can be used in modular pieces as each ML problem presents unique challenges. Through standardization of the path to production, training environments and the methods for collecting and transforming data on Spark, each model is reproducible and iterable.
This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adapted in Airbnb and we have variety of models running in production. We have seen the overall model development time go down from many months to days on Bighead. We plan to open source Bighead to allow the wider community to benefit from our work.
- Learn to understand what knowledge graphs are for
- Understand the structure of knowledge graphs (and how it relates to taxonomies and ontologies)
- Understand how knowledge graphs can be created using manual, semi-automatic, and fully automatic methods.
- Understand knowledge graphs as a basis for data integration in companies
- Understand knowledge graphs as tools for data governance and data quality management
- Implement and further develop knowledge graphs in companies
- Query and visualize knowledge graphs (including SPARQL and SHACL crash course)
- Use knowledge graphs and machine learning to enable information retrieval, text mining and document classification with the highest precision
- Develop digital assistants and question and answer systems based on semantic knowledge graphs
- Understand how knowledge graphs can be combined with text mining and machine learning techniques
- Apply knowledge graphs in practice: Case studies and demo applications
stackconf 2022: Introduction to Vector Search with WeaviateNETWAYS
In machine learning, e.g., recommendation tools or data classification, data is often represented as high-dimensional vectors. These vectors are stored in so-called vector databases. With vector databases you can efficiently run searching, ranking and recommendation algorithms. Therefore, vector databases became the backbone of ML deployments in industry. This session is all about vector databases. If you are a data scientist or data/software engineer this session would be interesting for you. You will learn how you can easily run your favourite ML models with the vector database Weaviate. You will get an overview of what a vector database like Weaviate can offer: such as semantic search, question answering, data classification, named entity recognition, multimodal search, and much more. After this session, you are able to load in your own data and query it with your preferred ML model!
Session outline
What is a vector database?
You will learn the basic principles of vector databases. How data is stored, retrieved, and how that differs from other database types (SQL, knowledge graphs, etc).
Performing your first semantic search with the vector database Weaviate.
In this phase, you will learn how to set up a Weaviate vector database, how to make a data schema, how to load in data, and how to query data. You can follow along with examples, or you can use your own dataset.
Advanced search with the vector database Weaviate.
Finally, we will cover other functionalities of Weaviate: multi-modal search, data classification, connecting custom ML models, etc.
Understanding Hallucinations in LLMs - 2023 09 29.pptxGreg Makowski
Hallucinations are a current fundamental problem for LLMs.
For one example, June this year in New York, attorneys did "research" on past cases with ChatGPT and turned it in to the Judge as a brief. The opposing council reported to the judge that they could not find the cases. When the judge confronted the GPT using attorneys, they stood behind their brief. The judge find the firm $5000.
Could this happen to you? YES. What can be done to avoid this in the future? I will answer.
In this talk, I will explain some fundamental areas of LLM's to explain how and why hallucinations occur. To understand that, an introduction into how words, concepts and dialogs are represented will help.
Words were first represented as a point in an embedding space with Word2Vec in 2013. This could compress 10,000 words into a vector of 300 elements, with a word being represented as a point in the 300-dimensional embedding space. Not just words can be represented, but also longer text, such as books can be compressed into a type of embedding. In that situation, areas of embedding space relate to different genres, such as: non-fiction, science fiction, children's fiction and so on. A new data point between training data points, when converted to text, would be a hallucination. In the area of "legal cases" in embedding space, if there is not an exact match, the text generation would try to generate what is plausible.
During an LLM conversation, the output of the previous text provides context for the next text in the style of a recurrent neural network. The starting position of a conversation matters. Understanding areas of weight space represent genres like "non-fiction" or other language aspects, and the starting position of a discussion time series matters, helps to understand why prompt engineering helps. The neural network conversation is represented in the activations of the 7B or 500B weights, a much larger space. During a conversation, learning is not occurring, but neural network activations are changing. The neural network is not a database. Even if you reach the exact set of weight activations from a training record, due to lossy compression, the exact text may not be regenerated.
Chat GPT does not use word embeddings. For implementation efficiency reasons, it is practical to break down what is embedded to about 50,000 items in a lookup table. Also, if we want to support proper nouns, like names, and dozens of languages, the number of words would be in the millions. Chat GPT and other LLMs use "tokens" for embedding. Examples of Byte Pair Encoding (BPE) and its process is given. The ChatGPT embedding is a vector of numbers 1,536 long for each token.
A solution for today is Retrieval Augmented Generation (RAG). As a brief introduction, you can ask with an English or natural question. It can be matched against a large library or database of paragraphs from internal documents or websites.
UPDATED FOR 2014: Archives work is messy -- in many cases archivists have to organize and make accessible large amounts of mixed data in a variety of formats, both physical and digital. Thankfully, there are a variety of technology tools available to help solve the messiness problem and make collections more accessible. In this session, audience members will learn about current and emerging archival technology tools, the pros and cons of the major tools, and resources for further education.
Deep dive into LangChain integration with Neo4j.pptxTomazBratanic1
Deep dive into LangChain integrations with Neo4j. Learn how to query your graph with LangChain either by generating Cypher statements using LLMs or using the vector index.
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
Over the past 4 years, the Semantic Web activity has gained momentum with the widespread publishing of structured data as RDF. The Linked Data paradigm has therefore evolved from a practical research idea into
a very promising candidate for addressing one of the biggest challenges
of computer science: the exploitation of the Web as a platform for data
and information integration. To translate this initial success into a
world-scale reality, a number of research challenges need to be
addressed: the performance gap between relational and RDF data
management has to be closed, coherence and quality of data published on
the Web have to be improved, provenance and trust on the Linked Data Web
must be established and generally the entrance barrier for data
publishers and users has to be lowered. This tutorial will discuss
approaches for tackling these challenges. As an example of a successful
Linked Data project we will present DBpedia, which leverages Wikipedia
by extracting structured information and by making this information
freely accessible on the Web. The tutorial will also outline some recent advances in DBpedia, such as the mappings Wiki, DBpedia Live as well as
the recently launched DBpedia benchmark.
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. Many ML Platforms cover data collection, feature engineering, training, deploying, productionalization, and monitoring but few, if any, do all of the above seamlessly.
Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python and Spark and can be used in modular pieces as each ML problem presents unique challenges. Through standardization of the path to production, training environments and the methods for collecting and transforming data on Spark, each model is reproducible and iterable.
This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adapted in Airbnb and we have variety of models running in production. We have seen the overall model development time go down from many months to days on Bighead. We plan to open source Bighead to allow the wider community to benefit from our work.
- Learn to understand what knowledge graphs are for
- Understand the structure of knowledge graphs (and how it relates to taxonomies and ontologies)
- Understand how knowledge graphs can be created using manual, semi-automatic, and fully automatic methods.
- Understand knowledge graphs as a basis for data integration in companies
- Understand knowledge graphs as tools for data governance and data quality management
- Implement and further develop knowledge graphs in companies
- Query and visualize knowledge graphs (including SPARQL and SHACL crash course)
- Use knowledge graphs and machine learning to enable information retrieval, text mining and document classification with the highest precision
- Develop digital assistants and question and answer systems based on semantic knowledge graphs
- Understand how knowledge graphs can be combined with text mining and machine learning techniques
- Apply knowledge graphs in practice: Case studies and demo applications
stackconf 2022: Introduction to Vector Search with WeaviateNETWAYS
In machine learning, e.g., recommendation tools or data classification, data is often represented as high-dimensional vectors. These vectors are stored in so-called vector databases. With vector databases you can efficiently run searching, ranking and recommendation algorithms. Therefore, vector databases became the backbone of ML deployments in industry. This session is all about vector databases. If you are a data scientist or data/software engineer this session would be interesting for you. You will learn how you can easily run your favourite ML models with the vector database Weaviate. You will get an overview of what a vector database like Weaviate can offer: such as semantic search, question answering, data classification, named entity recognition, multimodal search, and much more. After this session, you are able to load in your own data and query it with your preferred ML model!
Session outline
What is a vector database?
You will learn the basic principles of vector databases. How data is stored, retrieved, and how that differs from other database types (SQL, knowledge graphs, etc).
Performing your first semantic search with the vector database Weaviate.
In this phase, you will learn how to set up a Weaviate vector database, how to make a data schema, how to load in data, and how to query data. You can follow along with examples, or you can use your own dataset.
Advanced search with the vector database Weaviate.
Finally, we will cover other functionalities of Weaviate: multi-modal search, data classification, connecting custom ML models, etc.
Understanding Hallucinations in LLMs - 2023 09 29.pptxGreg Makowski
Hallucinations are a current fundamental problem for LLMs.
For one example, June this year in New York, attorneys did "research" on past cases with ChatGPT and turned it in to the Judge as a brief. The opposing council reported to the judge that they could not find the cases. When the judge confronted the GPT using attorneys, they stood behind their brief. The judge find the firm $5000.
Could this happen to you? YES. What can be done to avoid this in the future? I will answer.
In this talk, I will explain some fundamental areas of LLM's to explain how and why hallucinations occur. To understand that, an introduction into how words, concepts and dialogs are represented will help.
Words were first represented as a point in an embedding space with Word2Vec in 2013. This could compress 10,000 words into a vector of 300 elements, with a word being represented as a point in the 300-dimensional embedding space. Not just words can be represented, but also longer text, such as books can be compressed into a type of embedding. In that situation, areas of embedding space relate to different genres, such as: non-fiction, science fiction, children's fiction and so on. A new data point between training data points, when converted to text, would be a hallucination. In the area of "legal cases" in embedding space, if there is not an exact match, the text generation would try to generate what is plausible.
During an LLM conversation, the output of the previous text provides context for the next text in the style of a recurrent neural network. The starting position of a conversation matters. Understanding areas of weight space represent genres like "non-fiction" or other language aspects, and the starting position of a discussion time series matters, helps to understand why prompt engineering helps. The neural network conversation is represented in the activations of the 7B or 500B weights, a much larger space. During a conversation, learning is not occurring, but neural network activations are changing. The neural network is not a database. Even if you reach the exact set of weight activations from a training record, due to lossy compression, the exact text may not be regenerated.
Chat GPT does not use word embeddings. For implementation efficiency reasons, it is practical to break down what is embedded to about 50,000 items in a lookup table. Also, if we want to support proper nouns, like names, and dozens of languages, the number of words would be in the millions. Chat GPT and other LLMs use "tokens" for embedding. Examples of Byte Pair Encoding (BPE) and its process is given. The ChatGPT embedding is a vector of numbers 1,536 long for each token.
A solution for today is Retrieval Augmented Generation (RAG). As a brief introduction, you can ask with an English or natural question. It can be matched against a large library or database of paragraphs from internal documents or websites.
UPDATED FOR 2014: Archives work is messy -- in many cases archivists have to organize and make accessible large amounts of mixed data in a variety of formats, both physical and digital. Thankfully, there are a variety of technology tools available to help solve the messiness problem and make collections more accessible. In this session, audience members will learn about current and emerging archival technology tools, the pros and cons of the major tools, and resources for further education.
These slides are the basis of an Open Repositories 2015 talk about Archivematica integration.
Abstract: The open repository ecosystem consists of many interlocking systems which satisfy needs at different points in content management workflows, and these differ within and among institutions. Archivematica is a digital preservation system which aims to integrate with existing repository, storage and access systems in order to leverage the resources that institutions have invested towards building their repository over time. The presentation will cover every integration the Archivematica project has completed thus far, including Dspace and DuraCloud, LOCKSS, Islandora/Fedora, Archivists' Toolkit, AccessToMemory (AtoM), CONTENTdm, Arkivum, HP Trim, and OpenStack, as well as ongoing projects with ArchivesSpace, Dataverse, and BitCurator. Each of these projects has had its own set of limitations in scope because of the requirements of the project sponsor and/or the limitations of other system, so in many ways several of them are not, and may never be 'complete' integrations. The discussion will explore what that means and strategies for expanding the functional capabilities of integration work over time. It will address scoping integration workflows and building requirements with limitations on functionality and resources. We will examine how systems can be built and enhanced in ways that accommodate diverse workflows and varied interlocking endpoints.
The Dendro research data management platform: Applying ontologies to long-ter...João Rocha da Silva
It has been shown that data management should start as early as possible in the research workflow to minimize the risks of data loss. Given the large numbers of datasets produced every day, curators may be unable to describe them all, so researchers should take an active part in the process. However, since they are not data management experts, they must be provided with user-friendly but powerful tools to capture the context information necessary for others to interpret and reuse their datasets. In this paper, we present Dendro, a fully ontology-based collaborative platform for research data management. Its graph data model innovates in the sense that it allows domain-specific lightweight ontologies to be used in resource description, acting as a staging area for later deposit in long-term preservation solutions.
INNOVATION AND RESEARCH (Digital Library Information Access)Libcorpio
Innovation and research, Digital Library Information Access, LIS Education, Library and Information Science, LIS Studies, Information Management, Education and Learning, Library science, Information science, Digital Libraries, Research on Digital Libraries, DL, Innovation in libraries and publishing, Areas of Research for DL, Information Discovery, Collection Management and Preservation, Interoperability, Economic, Social and Legal Issues, Core Topics In Digital Libraries, DL Research Around The World
Slides accompanying a day-long introduction to AtoM and Archivematica, presented by Dan Gillean and Justin Simpson at the UK National Archives as part of an AIM25 and Higher Education Archive Programme Network Meeting, December 2, 2016.
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Artefactual Systems - AtoM
These slides accompanied a June 4th, 2016 presentation made by Dan Gillean of Artefactual Systems at the Association of Canadian Archivists' 2016 Conference in Montreal, QC, Canada.
This presentation aims to examine several existing or emerging computing paradigms, with specific examples, to imagine how they might inform next-generation archival systems to support digital preservation, description, and access. Topics covered include:
- Distributed Version Control and git
- P2P architectures and the BitTorrent protocol
- Linked Open Data and RDF
- Blockchain technology
The session is part of an attempt by the ACA to create interactive "working sessions" at its conferences. Accompanying notes can be found at: http://bit.ly/tech-Proche
Participants were also asked to use the Twitter hashtag of #techProche for online interaction during the session.
"Filling the Digital Preservation Gap" with ArchivematicaJenny Mitcham
A webinar given by Jenny Mitcham and Simon Wilson to Digital Preservation Coalition members on 25th November 2015. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
Overview of the ITS department's projects, services, and staff. A look at our areas, including IT infrastructure, eresources management, digital library services, and admin & communication.
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
A presentation on research data management tools, workflows and best practices at Imperial College London with a focus on software management. Presented at the 2017 session of the HPC Summer School (Dept. of Computing).
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...Artefactual Systems - AtoM
Slides accompanying a presentation given by Dan Gillean on June 7th, 2018 at Open Repositories 2018, held in Bozeman, MT.
Access to Memory is a web-based open source application for standards based description and access. AtoM was first released in 2008 and much of the codebase is now relying on deprecated frameworks and libraries – and at the same time, new standards and technologies are changing how our profession approaches description and access. Currently Artefactual Systems, a Canadian based company, uses a services model to support the project. Artefactual is looking ahead to AtoM3, and considering building a linked data driven platform for archival description and access. As we consider AtoM's next generation, we are also examining governance and maintenance models to sustain the project and better empower our user community as Artefactual wasn't originally intended to be AtoM's organizational home. This presentation will offer some thoughts on existing open source project governance models, challenges, and possibilities for the future. How do we ensure community engagement and project sustainability over time?
Project update: A collaborative approach to "filling the digital preservation...Jenny Mitcham
A presentation given by Julie Allinson at the UK Archivematica group meeting on 6th November 2015 in Leeds. It describes work underway in the "Filling the Digital Preservation Gap" project using Archivematica to preserve research data
The Oxford Common File Layout: A common approach to digital preservationSimeon Warner
The Oxford Common File Layout (OCFL) specification began as a discussion at a Fedora/Samvera Camp held at Oxford University in September of 2017. Since then, it has grown into a focused community effort to define an open and application-independent approach to the long-term preservation of digital objects. Developed for structured, transparent, and predictable storage, it is designed to promote sustainable long-term access and management of content within digital repositories. This presentation will focus on the motivations and vision for the OCFL, explain key choices for the specification, and describe the status of implementation efforts.
Selecting a Digital Collections Management System: Getting Large Projects Don...Cliff Landis
In 2017-2018, the Atlanta University Center Robert W. Woodruff Library selected a digital collections management system in seven months by using practical project management, documentation, and communication techniques to setup workflows, balance schedules, and ensure that all stakeholders had a voice. Session attendees will learn how to take the pressure off of large projects by breaking them down into discrete phases based around producing documentation. Lessons learned will also be discussed, with pragmatic tips to avoid problems in future projects.
Society of Georgia Archivists 2018 Annual Meeting
This presentation provides an accessible introduction to Linked Open Data (LOD) and how LOD is modelled and made available online. The presenters will discuss several LOD projects created by libraries and archives in order to illustrate the benefits of applying LOD principles and practices. They will also demonstrate easy ways to leverage the power of LOD for archival organizations and their digital collections, with concrete examples involving WikiData, Omeka S, and the SNAC (Social Networks and Archival Context) Project.
Society of Georgia Archivists 2018 Annual Meeting
Speakers:
Josh Hogan, Atlanta University Center Robert W. Woodruff Library
Cliff Landis, Atlanta University Center Robert W. Woodruff Library
Did you ever wish that someone would just hand you a checklist for getting a job? Great news – this session will not only give you that checklist, but walk you through the whole process! Learn how to use self-assessment to review your skills, biases, values and interests. Improve your chances by leveraging your GLA membership, searching for jobs in library-adjacent fields, breaking down job advertisements, and writing audience-focused application packets. Wrap up by setting yourself up for success in your first six months on the job while helping your colleagues get their own dream jobs!
An Introduction to Linked Data for Librarians (2018-06-28)Cliff Landis
Presented to the Special Libraries Association Georgia Chapter for their Spring Luncheon. This presentation gives advice for librarians on how to get started exploring and implementing linked data.
Digitization at the AUC Robert W. Woodruff Library - A Case StudyCliff Landis
Presentations on digitization often focus on following best practices and standards, but rarely give you the "behind the scenes" view that many of us crave. See how the rubber meets the road at the AUC Robert W. Woodruff Library in this two-part case study series on digitization and metadata. In this session we'll discuss our equipment setup, material selection process, detailed workflows, and solutions for dealing with challenging materials. Additionally, you'll learn how our methods of documentation, collaboration, and philosophy keep the digitization machine running smoothly!
Learning Outcomes:
-- Discover how documentation and collaboration have averted digitization disasters.
-- Document work quickly and easily in OneNote, and share it with colleagues effortlessly.
-- Explore how workflows can be customized to match context.
-- Learn about the "fail fast and learn" approach and how to put it into practice.
A very brief slide deck on the basics of conflict, with library-themed examples. Presented to Atlanta Emerging Librarians for the panel "You Got the Job – Now What? Rising to the Challenge in Your New Library Position"
Take a glimpse at the emerging technologies, biotechnological enhancements, and shifts in information culture that are impacting the future of libraries.
Is a text reference service right for you? Four academic libraries in Georgia summarize their experience with text reference to help you understand the technology options, set-up issues, and patron usage of the service.
Casey Long, Agnes Scott College
Sarah Steiner, Georgia State University
Jeff Gallant, Valdosta State University
James Stephens, Savannah State University
Cliff Landis, Georgia State University
Say What You Mean: Professional Communication Skills for LibrariansCliff Landis
Excellent interpersonal communication skills are not just a requirement on every job announcement--they are vital to succeed in today's library! Attendees will learn how to use different communication styles to interact effectively with people across several library settings. A variety
of interpersonal communication topics will be covered, including: basic communication skills, direct vs. indirect communication, conflict management, and professional relationship maintenance.
This presentation outlines the development of the concept of Library 2.0, how it is being implemented in one library, and what its possibilities are for the future.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
3. Learning Objectives
● Identify existing and emerging areas of
archival technology development.
● Learn about the capabilities, pros, and cons
of major archival management tools, such as
Archon and Archivists' Toolkit.
● Learn about the capabilities, pros, and cons
of major digital collection management tools,
such as CONTENTdm and Islandora.
● Discover resources for further professional
development in archival technology areas
such as software, hardware, and standards.
4. Introductions
EGO TIME!
● Library (and Archival)
Technologist
● Author of A Social
Networking Primer for
Librarians (2010)
● Professional Geek
● I work as a translator
between several library
dialects including:
Student, Techie,
Librarian, Archivist and
Administrator!
5. Why does this stuff matter?
http://www.flickr.com/photos/80749232@N00/2563365462/
6. Two questions
1) What one thing do you hope to learn
today?
2) What one thing do you hope to do with
archival technology?
11. It's all about using the right tool for
the job...
http://www.flickr.com/photos/takomabibelot/4355506368/
12. Preliminary Considerations
● Free vs. paid
● Open source vs. closed source
● Local server vs. cloud hosted
● Few features vs. many features (vs. some
features)
● Web-based vs. client-based
● Ease of setup, ease of use
● Degree of technical support
● Standards compliance
16. Archon
● Developed by the University of Illinois at
Urbana-Champaign (2006-2011).
● Free, Open-Source Software (FOSS), locally
hosted, many features, limited exports.
● Has both a back-end (for managing records)
and a front-end (for access).
● Full life-cycle management. Lacks some
features (some metadata exports,
deaccessioning, etc.).
20. Archivists' Toolkit
● Developed with a Mellon Foundation grant and
continued by Five Colleges, Inc., New York University
Libraries, and the UC San Diego Libraries (2006-2009).
● Free, Open-Source Software (FOSS), locally hosted,
many features, exports in many standards/formats.
● Server and client software
● Has a back-end (for managing records). No web
publishing available.
● Full life-cycle management. Lacks some features
(backup/restore, publishing finding aids, etc.)
22. Up next: ArchivesSpace
● Funded by a Mellon Foundation grant, created by New
York University, the University of California San Diego,
and the University of Illinois at Urbana-Champaign.
Hmmmm...those names look familiar...
● The best of both worlds?
● Version 1.0 in "late July" 2013
● Membership option, free option
● "Organizational home" at LYRASIS
● http://www.archivesspace.org/
● https://github.com/hudmol/archivesspace
24. ICA-AtoM
● ICA-AtoM is web-based archival description software
that is based on International Council on Archives
('ICA') standards. 'AtoM' is an acronym for 'Access to
Memory' (2008-2013).
● Developed by Artefactual Systems in collaboration with
the ICA Program Commission (PCOM) and a growing
network of international partners.
● Free, Open-Source Software (FOSS). Web-based, so
requires server or virtual appliance setup.
25. Others
● Adlib Archive
● Calm for Archives
● Cuadra STAR / Archives
● Eloquent Archives
● MINISIS M2A
● Collective Access
● PastPerfect
...and many more
28. Fedora
● NOT the Linux operating system....
● aka: Fedora Repository / Fedora Commons
● Developed by Cornell University and the University of
Virginia Library, currently supported by DuraSpace
● FOSS, server-side.
● Flexible architecture, allowing you to customize it (add
on components) to meet local needs. Requires more
work.
● Ingest, management, and basic delivery -- not a full-
fledged system for managing digital assets.
30. Islandora
● Fedora (asset management), Drupal (website
functionality) and Solr (search). Additional "Solution
Packs" of software to manage particular data types
(books, PDFs, large images, etc.).
● Developed by Prince Edward Island University.
● FOSS, server-side. Has to be assembled by
programmers / systems folks. Requires a LOT of work
and maintenance at this point. Not a "download and
double-click" software.
33. CONTENTdm (and a lot of work...)
http://digitalcollections.library.gsu.edu/maps/?overlay=atlpm0031e
34. CONTENTdm
● Closed source, OCLC, and paid (expensive!).
● A full system for managing digital collections. Can be
hosted by OCLC or run on your own servers (hosted
version limits customization).
● Server-side software, web interface and project client
software. Lots of moving pieces to get to work together
with limited documentation and slow technical support
response time.
37. Greenstone
● Developed by New Zealand Digital Library Project at the
University of Waikato, with support from UNESCO.
● FOSS, server-side.
● Multi-lingual and multi-national.
● Unsure how active the development community is, as I
haven't seen much work on it since 2012.
40. Dspace
● Developed by the MIT Libraries and Hewlett-Pckard
Labs
● FOSS, server-side. Hosted option available
(DSpaceDirect)
● Manakin add-on for improved user interface
● Not easy to set up or customize, but effective
41. Others
● Tripod2 (Duke University, in-house)
● Keystone (Index Data)
● EPrints (University of Southampton)
● and many more...
45. Omeka
● Web publishing of narratives around digital collections.
● Center for History and New Media (CHNM) at George
Mason University
● FOSS, server-side. Hosted versions also available.
● Designed to be relatively easy to use for non-technical
folks.
● Has plugins available for additional functionality (OAI-
PMH, CSV import, Dublin Core, etc.)
46. Others
● Collective Access
● Virtual Exhibit (for Past Perfect)
● Internet Archive
● Picasa/Flickr
● Blogs/Websites
● and many more...
48. Digital Preservation Hardware
● Media readers (drives, connections)
○ Floppy Discs
○ Zip
○ Jaz
○ CD / DVD / BluRay / Laserdisc
○ Cartridges
○ Microcards
● Write-blockers / Forensic Bridges
○ Tableau
○ Weibe Tech
See: Webinar: “Intro to Digital Preservation #3 — “Management of Incoming Born-Digital Special Collections”
49. Digital Preservation Software
● FITS & JHOVE: used to identify file formats
and extract metadata
● IdentityFinder: searches for Personally
Identifiable Information (PII)
● PREMIS: manage metadata of digital objects
● Bagit: file transfers
● BitCurator & Archivematica: accessioning
through access
See: Intro to Digital Preservation websinar series
50. Formats & Protocols & Standards!
● XML: eXtensible Markup Language
● DTD: Document Type Definition (aka "Schema")
● EAD: Encoded Archival Description
● OAI-PMH: The Open Archives Initiative Protocol for
Metadata Harvesting
● OAI-ORE: The Open Archives Initiative Protocol for
Object Reuse and Exchange
● RSS: Really Simple Syndication
● DC: Dublin Core (also DCMI)
● RDF: Resource Description Framework
● SQL: Structured Query Language
● MODS: Metadata Object Description Schema
● METS: Metadata Encoding and Transmission Standard
55. Semantic Web
BUT WHAT DOES IT ALL MEAN?!?!!
● Microformats: a way of adding human- and machine-
readable metadata into existing HTML webpages.
○ COinS: ContextObjects in Spans. Allows users to
embed machine-readable bibliographic metadata in
HTML webpages.
● RDFa Lite: Resource Description Framework in
attributes - another way of adding human- and machine-
readable metadata into existing HTML pages.
56. Why does this stuff matter?
http://www.flickr.com/photos/80749232@N00/2563365462/
57. Resources:
● Spiro, Lisa (2009). Archival Management Software: A
Report for the Council on Library and Information
Resources. http://www.clir.
org/pubs/reports/spiro/spiro_Jan13.pdf and http:
//archivalsoftware.pbworks.com
● Bean, Carol (2010). Comparing Digital Library Systems
(BeanWorks). http://beanworks.clbean.
com/2010/04/comparing-digital-library-systems/
● Association of Southeastern Research Libraries.
Archived Webinars / Materials. http://aserl.org/archive
● Digital Preservation - Tools Showcase. http://www.
digitalpreservation.gov/tools/
● W3C Schools. http://www.w3schools.com/
58. Not that it has to be said, but...
Disclaimer!
All images and excerpts included are being
used under the auspices of Fair Use for the
purposes of nonprofit education, criticism, and
comment as outlined in 17 U.S.C. § 107.