The document summarizes the history and operations of the ALTO Editorial Board. It describes ALTO as an XML standard for describing text layout in digitized documents. The board has an international membership representing major libraries. It meets regularly to review proposals to update the ALTO standard. The board follows a standardized process for submitting and reviewing proposals, with a designated member championing each proposal. It aims to balance functionality improvements with backward compatibility for digital library systems using ALTO.
Open-Content Text Corpus for African languagesGuy De Pauw
The document discusses the Open-Content Text Corpus (OCTC), a platform for storing and sharing open-content data for African languages. The OCTC aims to address issues of "data islands" by providing a secure platform for researchers to store and build upon each other's data. It uses XML encoding and supports over 50 languages with initial data sets. The OCTC could help preserve data, improve data through collaboration, train researchers, and be a low-cost solution by fighting data isolationism.
Pratt SILS Knowledge Organization Spring 2011PrattSILS
The document discusses folksonomies as a method for organizing information through individual and collaborative tagging. It defines folksonomies and compares them to traditional classification systems like Dewey and LCSH. It also discusses theories of folksonomies, how they are created through tagging in a Web 2.0 environment, and how multiple perspectives can be represented. The benefits and potential issues of using folksonomies in libraries and other information settings are considered. Examples of open source applications that use folksonomies are provided.
Kimberly Silk presented on data management and discovery at the Martin Prosperity Institute. The MPI collects large social science datasets from various common and authoritative sources to support research. To better organize their growing collection, the MPI implemented an open data discovery platform called Dataverse to catalog and provide access to their datasets. Open data initiatives aim to make certain government data freely available to the public, but also present challenges around data preparation, support, and responsiveness. Big data refers to extremely large datasets beyond the capabilities of typical database tools, and data visualization is an important way to communicate insights from data.
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...PrattSILS
This document discusses challenges related to using Twitter data for research purposes. Twitter has restrictions on the distribution and download of tweet IDs and user IDs. Researchers are limited to hydrating up to 50,000 public tweets per day. Social media collections within web archives tend to be event-driven and limited in scope. The algorithms used by Twitter to generate sample sizes cannot be verified by researchers. Storage space and sufficient computing infrastructure are also challenges. The Library of Congress has archived over 170 billion tweets but has not yet provided full access due to technical limitations.
20120820 conversion of historic newspapers to digital objects [boris yeltsin ...Frederick Zarndt
This document provides statistics about the digitization of newspapers and use of digital and physical newspapers. It includes data such as population sizes, numbers of unique visitors to digital newspaper collections, requests for physical newspapers in reading rooms, percentages of genealogical and other users, and ages of users. The statistics cover countries such as Australia, France, the Netherlands, New Zealand, Norway, Singapore, the UK, and the USA. It compares numbers of requests for physical newspapers to numbers of visitors for digitized historical newspapers. It also provides additional data on some digital newspaper collections, including their approximate sizes in pages and numbers of lines corrected.
The document discusses the importance of effective communication across cultures and the challenges that can arise due to differences in language, communication styles, and cultural understandings. It notes that surveys have found language barriers and cultural differences to be among the biggest problems with offshore outsourcing projects. The document then defines culture and discusses different levels of mental programming, cultural expectations, stereotypes, and the need for cross-cultural proficiency. It also examines basic human nature and activities as well as how culture can influence communication and the perception process.
International Newspaper Digitization:ALA Newspaper Interest GroupFrederick Zarndt
This document summarizes newspaper digitization efforts abroad. It discusses common file formats, software, and scale of efforts. For example, the National Library of Australia has scanned over 2 million pages and made 450,000 pages searchable, with a goal of 4 million pages by 2010. The Bibliotheque nationale de France has digitized 2 million pages and aims for 3.5 million by 2010. Born digital newspapers are also being collected and archived in some countries. Access and copyright policies for in-copyright materials vary between institutions.
20120821 putting the world’s cultural heritage online with crowd sourcing [na...Frederick Zarndt
This document discusses how cultural heritage organizations are using crowdsourcing to digitize collections. It provides examples of several crowdsourced projects, including the National Library of Australia's Trove newspaper digitization project, Family Search indexing of genealogical records, and the California Digital Newspaper Collection's crowdsourced OCR text correction. The document also examines the motivations and types of participants in these crowdsourced cultural heritage projects.
Open-Content Text Corpus for African languagesGuy De Pauw
The document discusses the Open-Content Text Corpus (OCTC), a platform for storing and sharing open-content data for African languages. The OCTC aims to address issues of "data islands" by providing a secure platform for researchers to store and build upon each other's data. It uses XML encoding and supports over 50 languages with initial data sets. The OCTC could help preserve data, improve data through collaboration, train researchers, and be a low-cost solution by fighting data isolationism.
Pratt SILS Knowledge Organization Spring 2011PrattSILS
The document discusses folksonomies as a method for organizing information through individual and collaborative tagging. It defines folksonomies and compares them to traditional classification systems like Dewey and LCSH. It also discusses theories of folksonomies, how they are created through tagging in a Web 2.0 environment, and how multiple perspectives can be represented. The benefits and potential issues of using folksonomies in libraries and other information settings are considered. Examples of open source applications that use folksonomies are provided.
Kimberly Silk presented on data management and discovery at the Martin Prosperity Institute. The MPI collects large social science datasets from various common and authoritative sources to support research. To better organize their growing collection, the MPI implemented an open data discovery platform called Dataverse to catalog and provide access to their datasets. Open data initiatives aim to make certain government data freely available to the public, but also present challenges around data preparation, support, and responsiveness. Big data refers to extremely large datasets beyond the capabilities of typical database tools, and data visualization is an important way to communicate insights from data.
LIS 653 Knowledge Organization | Pratt Institute School of Information | Fall...PrattSILS
This document discusses challenges related to using Twitter data for research purposes. Twitter has restrictions on the distribution and download of tweet IDs and user IDs. Researchers are limited to hydrating up to 50,000 public tweets per day. Social media collections within web archives tend to be event-driven and limited in scope. The algorithms used by Twitter to generate sample sizes cannot be verified by researchers. Storage space and sufficient computing infrastructure are also challenges. The Library of Congress has archived over 170 billion tweets but has not yet provided full access due to technical limitations.
20120820 conversion of historic newspapers to digital objects [boris yeltsin ...Frederick Zarndt
This document provides statistics about the digitization of newspapers and use of digital and physical newspapers. It includes data such as population sizes, numbers of unique visitors to digital newspaper collections, requests for physical newspapers in reading rooms, percentages of genealogical and other users, and ages of users. The statistics cover countries such as Australia, France, the Netherlands, New Zealand, Norway, Singapore, the UK, and the USA. It compares numbers of requests for physical newspapers to numbers of visitors for digitized historical newspapers. It also provides additional data on some digital newspaper collections, including their approximate sizes in pages and numbers of lines corrected.
The document discusses the importance of effective communication across cultures and the challenges that can arise due to differences in language, communication styles, and cultural understandings. It notes that surveys have found language barriers and cultural differences to be among the biggest problems with offshore outsourcing projects. The document then defines culture and discusses different levels of mental programming, cultural expectations, stereotypes, and the need for cross-cultural proficiency. It also examines basic human nature and activities as well as how culture can influence communication and the perception process.
International Newspaper Digitization:ALA Newspaper Interest GroupFrederick Zarndt
This document summarizes newspaper digitization efforts abroad. It discusses common file formats, software, and scale of efforts. For example, the National Library of Australia has scanned over 2 million pages and made 450,000 pages searchable, with a goal of 4 million pages by 2010. The Bibliotheque nationale de France has digitized 2 million pages and aims for 3.5 million by 2010. Born digital newspapers are also being collected and archived in some countries. Access and copyright policies for in-copyright materials vary between institutions.
20120821 putting the world’s cultural heritage online with crowd sourcing [na...Frederick Zarndt
This document discusses how cultural heritage organizations are using crowdsourcing to digitize collections. It provides examples of several crowdsourced projects, including the National Library of Australia's Trove newspaper digitization project, Family Search indexing of genealogical records, and the California Digital Newspaper Collection's crowdsourced OCR text correction. The document also examines the motivations and types of participants in these crowdsourced cultural heritage projects.
201308 wlic standards committee zarndt et al the alto editorial board collabo...Frederick Zarndt
The document provides information about the ALTO Editorial Board, which maintains the ALTO (Analyzed Layout and Text Object) XML standard for describing page layout and content metadata. The board was established in 2009 and is comprised of members from libraries and organizations around the world. The board aims to promote ALTO usage and ensure the standard evolves to meet emerging needs. Meeting agendas, procedures, and examples of board members' motivations for participation are presented.
This document summarizes the development of digital tools by the University Library "Svetozar Markovic" in Belgrade to handle METS/ALTO files of digitized historical Serbian newspapers. It describes:
1) A search interface and backend search engine that allows users to search 400,000 pages of newspapers and view search results and full text articles.
2) Tools developed for manual correction of OCR text errors and automatic detection of incorrectly scanned pages.
3) Experience gained from user feedback on the digital collection and recommendations for future work in newspaper digitization.
This presentation was provided by Ted Koppel ofAuto-Graphics, Inc, Ed Riding of SirsiDynix, Andrew K. Pace of OCLC, and John Mark Ockerbloom of The University of Pennsylvania, during the NISO webinar "Library Systems & Interoperability: Breaking Down Silos," held on June 10, 2009.
The IETF is organized into working groups that focus on specific topics and are overseen by area directors. It is formally part of the Internet Society and overseen by the Internet Architecture Board. The IETF began in 1986 and has grown significantly over time to include thousands of participants working to develop open standards for internet technologies through an open process.
The IETF is organized into working groups that focus on specific topics and are overseen by area directors. It is formally part of the Internet Society and overseen by the Internet Architecture Board. The IETF began in 1986 and has grown significantly over time to include thousands of participants working to develop open standards for internet technologies through an open process.
Briefing on OASIS XLIFF OMOS TC 20160121Jamie Clark
The briefing provided an overview of the XLIFF OMOS TC, which aims to develop an abstract object model and JSON serialization for the XLIFF 2.0 standard to improve interoperability. The TC is chaired by David Filip and has members from various organizations. It plans to deliver an object model for XLIFF 2.x, a JSON version called JLIFF 1.0, and work on a new version of TMX with an inline data model consistent with XLIFF 2.0. The TC uses a non-assertion IPR mode and invites participation from stakeholders in multilingual content and localization.
Rozz Evans
Collection Development Librarian Institute of Education, University of London spoke about the development content and future plans of Digital Education Resource Archive (DERA)
An Ontology For Historical Research DocumentsDereck Downing
The document describes an ontology-based digital archive called STOLE that collects historical journal articles from 1848-1946 about public administration in Italy. STOLE uses ontologies to semantically describe the domain knowledge and provide integration of data to support historians' research. The system architecture includes modules for an ontology, inference engine, triple store, SPARQL endpoint, and GUI. Future work plans to develop a graphical interface for ontology population and add semantic indexing capabilities.
Liberate Your Library Building A Scottish Consortium November 16th 2009Jonathan Field
The document outlines the agenda for an event on open source library management systems. It discusses the current state of the proprietary library system market and frustrations libraries face. It then introduces open source software and examples of successful open source projects. The rest of the agenda covers various open source library system modules and configuration. PTFS Europe is introduced as a supporter of open source systems like Koha and Evergreen, providing implementation and support services to help libraries migrate to these systems.
Stronger together: community initiatives in journal managementJisc
There has been a recent growth of initiatives to address common problems regarding current and long-term access to e-journal content. Jisc is at the forefront of many of these with the close participation and active input of educational institutions.
This session aims to summarise the current state of key themes with pointers to future directions of areas such as sustainability, the move towards e-only environments, and shared consortia approaches. It will provide an overview and panel discussion on developing the supporting infrastructure to meet the needs of users. The discussion will focus on how institutions, community bodies and service providers can best work together to ensure sustainable, long-term initiatives by seeking to introduce uniformity, standardisation and collaboration to an even greater extent.
The session will introduce two new Jisc-supported projects in this area, the Keepers Registry Extra and SafeNet initiatives, and discuss how these fit alongside existing Jisc services such as Knowledge Base+, UK LOCKSS Alliance, Journal Archives and JUSP (Journal Usage Statistics Portal). The panel will address how this catalogue of services contributes towards a coherent strategy in the management of e-journal content.
Slides accompanying a presentation delivered at the VII Congresso Nacional de Arquivologia in Fortaleza, Brazil, on October 19th, 2016. The slides provide an overview of the AtoM project's history, its maintenance by Artefactual, and its development philosophy, before proceeding to examine the application as a component used in a digital preservation ecosystem. Aspects of ISO 16363:2012, the Audit and Certification of Trustworthy Digital Repositories standard, are used to evaluate how AtoM can support description, management, administration, and access functions when used to maintain a chain of custody in a trustworthy digital repository ecosystem.
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012lljohnston
The document discusses some of the challenges the Library of Congress faces in collecting and preserving digital content. It receives content in a wide variety of formats from different programs and partners. These include digitized newspapers, web archives, audiovisual content, tweets, and electronic publications. The Library uses various strategies to help manage this complex task, such as file format standards, multiple copies in different locations, and partnerships with other institutions. However, the diversity of formats and sources means preserving every digital format is extremely challenging.
Manchester Seminar Liberate Your Library October 2009Jonathan Field
The document discusses open source library management software options. PTFS Europe offers support for the open source software Evergreen and Koha. They can help libraries with implementation, hosting, training and ongoing support. Using open source software provides benefits like reduced costs, increased flexibility and opportunities for collaboration compared to proprietary systems.
Between 2009 and 2012 the Higher Education Funding Council for England (HEFCE) funded a series of programmes to encourage higher education institutions in the UK to release existing educational content as Open Educational Resources (OER) and to embed open practices in the institution. The HEFCE funded UK OER Programmes were run and managed by the JISC and the Higher Education Academy. Over the course of three years about £15M (€17,5M) was invested on projects that investigated the release and collection of OERs by individuals, institutions and subject communities. The Cetis “OER Technology Support Project” provided support for technical innovation across this programme.
In this conference paper we will present our reflections on the technical approaches taken, issues raised and the lessons learnt from the Programmes and the Support Project. The issues covered include resource management, resource description, licensing and attribution, search engine optimisation and discoverability, tracking OERs, and paradata (activity data about learning resources). Technical solutions discussed will include the use of social sharing platforms such as flickr and WordPress for resource dissemination; metadata embedded in HTML documents as RDFa, microdata and using the schema.org ontology; and sharing metadata and paradata using the Learning Registry (a network of schema-free data stores). As well as describing the achievements of the programme, we will also discuss the difficulties encountered and identify areas where further work is required.
Harvesting Repositories: DPLA, Europeana, & Other Case Studieseohallor
Join this discussion on the benefits and process of harvesting to aggregators such as DPLA, Europeana and other aggregators. Through case studies we'll outline three stages of the process, including 1) mapping, migrating, and normalizing data in open source digital repositories, 2) making use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI - PMH), and 3) reaping the benefits of increased exposure. Presenters welcome lively discussion and questions from participants of all technical backgrounds and skill levels.
Digitization of the Tuol Sleng Genocide Museum ArchivesFrederick Zarndt
This document provides a summary of a report on a project to preserve, digitize, index and host archives from the Tuol Sleng Genocide Museum in Cambodia. The project aims to spread an objective vision of history by digitizing over 400,000 pages of materials related to the Khmer Rouge regime and Tuol Sleng prison. Key aspects of the project include training museum staff, digitizing the materials to high standards, creating searchable databases and indexes, and developing a public-facing website with crowd-sourcing capabilities to engage the Cambodian people. Challenges include the fragile materials, limited local skills and equipment, and ensuring the work is done to completion within a tight timeline and budget.
2017 Born Digital Legal Deposit Policies and PracticesFrederick Zarndt
This document summarizes the key details and findings of a survey conducted in 2014 and 2017 on born digital legal deposit policies and practices. The 2014 survey was sent to 20 national libraries and received responses from 17 libraries. It found that legal deposit laws varied widely, with Nordic countries leading in digital content capture while many others made no provision for digital. Only 7 countries addressed deposit of born-digital content. To update the survey, the authors expanded their team in 2017 and broadened the survey reach. The document reviews 17 previous related surveys from 2005-2016 on topics like audiovisual preservation, e-legal deposit, web archiving, and digital news preservation. It provides context on the goals and questions of each prior survey.
More Related Content
Similar to 201308 wlic standards committee zarndt et al the alto editorial board collaboration and cooperation across borders [singapore]
201308 wlic standards committee zarndt et al the alto editorial board collabo...Frederick Zarndt
The document provides information about the ALTO Editorial Board, which maintains the ALTO (Analyzed Layout and Text Object) XML standard for describing page layout and content metadata. The board was established in 2009 and is comprised of members from libraries and organizations around the world. The board aims to promote ALTO usage and ensure the standard evolves to meet emerging needs. Meeting agendas, procedures, and examples of board members' motivations for participation are presented.
This document summarizes the development of digital tools by the University Library "Svetozar Markovic" in Belgrade to handle METS/ALTO files of digitized historical Serbian newspapers. It describes:
1) A search interface and backend search engine that allows users to search 400,000 pages of newspapers and view search results and full text articles.
2) Tools developed for manual correction of OCR text errors and automatic detection of incorrectly scanned pages.
3) Experience gained from user feedback on the digital collection and recommendations for future work in newspaper digitization.
This presentation was provided by Ted Koppel ofAuto-Graphics, Inc, Ed Riding of SirsiDynix, Andrew K. Pace of OCLC, and John Mark Ockerbloom of The University of Pennsylvania, during the NISO webinar "Library Systems & Interoperability: Breaking Down Silos," held on June 10, 2009.
The IETF is organized into working groups that focus on specific topics and are overseen by area directors. It is formally part of the Internet Society and overseen by the Internet Architecture Board. The IETF began in 1986 and has grown significantly over time to include thousands of participants working to develop open standards for internet technologies through an open process.
The IETF is organized into working groups that focus on specific topics and are overseen by area directors. It is formally part of the Internet Society and overseen by the Internet Architecture Board. The IETF began in 1986 and has grown significantly over time to include thousands of participants working to develop open standards for internet technologies through an open process.
Briefing on OASIS XLIFF OMOS TC 20160121Jamie Clark
The briefing provided an overview of the XLIFF OMOS TC, which aims to develop an abstract object model and JSON serialization for the XLIFF 2.0 standard to improve interoperability. The TC is chaired by David Filip and has members from various organizations. It plans to deliver an object model for XLIFF 2.x, a JSON version called JLIFF 1.0, and work on a new version of TMX with an inline data model consistent with XLIFF 2.0. The TC uses a non-assertion IPR mode and invites participation from stakeholders in multilingual content and localization.
Rozz Evans
Collection Development Librarian Institute of Education, University of London spoke about the development content and future plans of Digital Education Resource Archive (DERA)
An Ontology For Historical Research DocumentsDereck Downing
The document describes an ontology-based digital archive called STOLE that collects historical journal articles from 1848-1946 about public administration in Italy. STOLE uses ontologies to semantically describe the domain knowledge and provide integration of data to support historians' research. The system architecture includes modules for an ontology, inference engine, triple store, SPARQL endpoint, and GUI. Future work plans to develop a graphical interface for ontology population and add semantic indexing capabilities.
Liberate Your Library Building A Scottish Consortium November 16th 2009Jonathan Field
The document outlines the agenda for an event on open source library management systems. It discusses the current state of the proprietary library system market and frustrations libraries face. It then introduces open source software and examples of successful open source projects. The rest of the agenda covers various open source library system modules and configuration. PTFS Europe is introduced as a supporter of open source systems like Koha and Evergreen, providing implementation and support services to help libraries migrate to these systems.
Stronger together: community initiatives in journal managementJisc
There has been a recent growth of initiatives to address common problems regarding current and long-term access to e-journal content. Jisc is at the forefront of many of these with the close participation and active input of educational institutions.
This session aims to summarise the current state of key themes with pointers to future directions of areas such as sustainability, the move towards e-only environments, and shared consortia approaches. It will provide an overview and panel discussion on developing the supporting infrastructure to meet the needs of users. The discussion will focus on how institutions, community bodies and service providers can best work together to ensure sustainable, long-term initiatives by seeking to introduce uniformity, standardisation and collaboration to an even greater extent.
The session will introduce two new Jisc-supported projects in this area, the Keepers Registry Extra and SafeNet initiatives, and discuss how these fit alongside existing Jisc services such as Knowledge Base+, UK LOCKSS Alliance, Journal Archives and JUSP (Journal Usage Statistics Portal). The panel will address how this catalogue of services contributes towards a coherent strategy in the management of e-journal content.
Slides accompanying a presentation delivered at the VII Congresso Nacional de Arquivologia in Fortaleza, Brazil, on October 19th, 2016. The slides provide an overview of the AtoM project's history, its maintenance by Artefactual, and its development philosophy, before proceeding to examine the application as a component used in a digital preservation ecosystem. Aspects of ISO 16363:2012, the Audit and Certification of Trustworthy Digital Repositories standard, are used to evaluate how AtoM can support description, management, administration, and access functions when used to maintain a chain of custody in a trustworthy digital repository ecosystem.
Leslie Johnston: Challenges of Preserving Every Digital Format, 2012lljohnston
The document discusses some of the challenges the Library of Congress faces in collecting and preserving digital content. It receives content in a wide variety of formats from different programs and partners. These include digitized newspapers, web archives, audiovisual content, tweets, and electronic publications. The Library uses various strategies to help manage this complex task, such as file format standards, multiple copies in different locations, and partnerships with other institutions. However, the diversity of formats and sources means preserving every digital format is extremely challenging.
Manchester Seminar Liberate Your Library October 2009Jonathan Field
The document discusses open source library management software options. PTFS Europe offers support for the open source software Evergreen and Koha. They can help libraries with implementation, hosting, training and ongoing support. Using open source software provides benefits like reduced costs, increased flexibility and opportunities for collaboration compared to proprietary systems.
Between 2009 and 2012 the Higher Education Funding Council for England (HEFCE) funded a series of programmes to encourage higher education institutions in the UK to release existing educational content as Open Educational Resources (OER) and to embed open practices in the institution. The HEFCE funded UK OER Programmes were run and managed by the JISC and the Higher Education Academy. Over the course of three years about £15M (€17,5M) was invested on projects that investigated the release and collection of OERs by individuals, institutions and subject communities. The Cetis “OER Technology Support Project” provided support for technical innovation across this programme.
In this conference paper we will present our reflections on the technical approaches taken, issues raised and the lessons learnt from the Programmes and the Support Project. The issues covered include resource management, resource description, licensing and attribution, search engine optimisation and discoverability, tracking OERs, and paradata (activity data about learning resources). Technical solutions discussed will include the use of social sharing platforms such as flickr and WordPress for resource dissemination; metadata embedded in HTML documents as RDFa, microdata and using the schema.org ontology; and sharing metadata and paradata using the Learning Registry (a network of schema-free data stores). As well as describing the achievements of the programme, we will also discuss the difficulties encountered and identify areas where further work is required.
Harvesting Repositories: DPLA, Europeana, & Other Case Studieseohallor
Join this discussion on the benefits and process of harvesting to aggregators such as DPLA, Europeana and other aggregators. Through case studies we'll outline three stages of the process, including 1) mapping, migrating, and normalizing data in open source digital repositories, 2) making use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI - PMH), and 3) reaping the benefits of increased exposure. Presenters welcome lively discussion and questions from participants of all technical backgrounds and skill levels.
Similar to 201308 wlic standards committee zarndt et al the alto editorial board collaboration and cooperation across borders [singapore] (20)
Digitization of the Tuol Sleng Genocide Museum ArchivesFrederick Zarndt
This document provides a summary of a report on a project to preserve, digitize, index and host archives from the Tuol Sleng Genocide Museum in Cambodia. The project aims to spread an objective vision of history by digitizing over 400,000 pages of materials related to the Khmer Rouge regime and Tuol Sleng prison. Key aspects of the project include training museum staff, digitizing the materials to high standards, creating searchable databases and indexes, and developing a public-facing website with crowd-sourcing capabilities to engage the Cambodian people. Challenges include the fragile materials, limited local skills and equipment, and ensuring the work is done to completion within a tight timeline and budget.
2017 Born Digital Legal Deposit Policies and PracticesFrederick Zarndt
This document summarizes the key details and findings of a survey conducted in 2014 and 2017 on born digital legal deposit policies and practices. The 2014 survey was sent to 20 national libraries and received responses from 17 libraries. It found that legal deposit laws varied widely, with Nordic countries leading in digital content capture while many others made no provision for digital. Only 7 countries addressed deposit of born-digital content. To update the survey, the authors expanded their team in 2017 and broadened the survey reach. The document reviews 17 previous related surveys from 2005-2016 on topics like audiovisual preservation, e-legal deposit, web archiving, and digital news preservation. It provides context on the goals and questions of each prior survey.
In 2015, three of the authors (Zarndt, McCain, Carner) surveyed the born digital content legal deposit policies and practices in 18 different countries and presented the results of the survey at the 2015 International News Media Conference hosted by the National Library of Sweden in Stockholm, Sweden, April 2015.
As a first step, the authors reviewed previous surveys about legal deposit and digital preservation. The authors updated and streamlined the 2015 survey in order to assess progress in creating or improving national policies and in implementing practices for preserving born digital content. The current survey consists of as many as 20 questions; which questions are asked depends on the respondent’s previous answers.
More than 50 countries and states in Australia, Germany and USA, participated in the survey. The survey closed at the end of November 2017. The authors expect to repeat the survey periodically in order to assess progress in developing born digital legal policy and implementing the policy in practice.
What did you say? interculture communication [20160308 phnom penh]Frederick Zarndt
The single biggest problem in communication is the illusion it has taken place. George Bernard Shaw, Irish playwright, co-founder of London School of Economics, and Nobel Prize in Literature (1925).
Projects are about communication, communication, and communication. B. Elenbass in "Staging a project: Are you setting your project up for success?"
What one says to compatriots in face-to-face conversation is often misunderstood; imagine the possibilities for misunderstandings with someone from halfway around the world, natively speaking another language, and living in a different culture! In such circumstances how can you be sure that your collocutor has understood you in face-to-face (hard), telephone (harder), and email (hardest) conversations? Without being fully present in the conversation -- mindfully aware -- whether it's face-to-face, by Skype or phone, or through email, successful communication is difficult, even more so for intercultural communication.
The ubiquity of English facilitates basic communication, but its use as a common language frequently disguises cultural differences. Furthermore, to say that English (or any other language) can be ambiguous, is an understatement. But regardless of language, clear communication is essential for success in any collaborative undertaking whether done by a small co-located group or by a globally dispersed team.
This tutorial teaches mindful communication and describes frameworks useful in understanding cultural differences and gives real-life examples of misunderstandings due to such differences. Expect to take away practical tools to understand your own cultural biases and in-class practice mindful communication with your colleagues from other cultures as well as your own. You will also learn about frameworks for understanding other cultures based on work by Geert Hofstede, Fons Trompenaars, and others as well as on the presenter's own experiences.
Coronado public library digital newspapers workshop local partnerships [oct 2...Frederick Zarndt
Using digitized historical newspapers for genealogical research
Brian Geiger, California Digital Newspaper Collection
Frederick Zarndt, IFLA Governing Board
1. Introductory remarks: Who we are; focus on freely available collections and especially those that allow researchers to create accounts; numerous sites they can pay to access but we won’t spend much time on them
2. Only small percentage of surviving newspapers have been digitized
3. How newspapers are digitized. Focusing especially on OCR, if it’s not OCR’ed well it’s not discoverable
4. How Coronado newspapers were digitized. CDNC’s work with the public library, Coronado Public Library’s work with the publisher, the process of scanning the film and processing the images, etc.
5. Free vs. Pay. 2 kinds of digitized newspaper archives: 1) publicly funded and available for free, 2) commercial sites you pay to access. Dozens or even hundreds of public sites, from small institutional to national.
6. Google won’t always get you what you want
7. Basic search using Elephind: What elephind is. Search “Abraham Lincoln” and explain what they see. Described “facets”
8. CDNC advanced search
9. Collecting What You Find: Right-click features in the CDNC
10. Collecting What You Find: CDNC user accounts
11. Interacting with Content: CDNC
12. Interacting with Content: Tagging and commenting in CDNC
Coronado public library digital newspapers workshop [Oct 2016]Frederick Zarndt
Using digitized historical newspapers for genealogical research
Brian Geiger, California Digital Newspaper Collection
Frederick Zarndt, IFLA Governing Board
1. Introductory remarks: Who we are; focus on freely available collections and especially those that allow researchers to create accounts; numerous sites they can pay to access but we won’t spend much time on them
2. Only small percentage of surviving newspapers have been digitized
3. How newspapers are digitized. Focusing especially on OCR, if it’s not OCR’ed well it’s not discoverable
4. How Coronado newspapers were digitized. CDNC’s work with the public library, Coronado Public Library’s work with the publisher, the process of scanning the film and processing the images, etc.
5. Free vs. Pay. 2 kinds of digitized newspaper archives: 1) publicly funded and available for free, 2) commercial sites you pay to access. Dozens or even hundreds of public sites, from small institutional to national.
6. Google won’t always get you what you want
7. Basic search using Elephind: What elephind is. Search “Abraham Lincoln” and explain what they see. Described “facets”
8. CDNC advanced search
9. Collecting What You Find: Right-click features in the CDNC
10. Collecting What You Find: CDNC user accounts
11. Interacting with Content: CDNC
12. Interacting with Content: Tagging and commenting in CDNC
What did you say? mindful interculture communication [201608 icgse]Frederick Zarndt
The single biggest problem in communication is the illusion it has taken place. George Bernard Shaw, Irish playwright, co-founder of London School of Economics, and Nobel Prize in Literature (1925).
Projects are about communication, communication, and communication. B. Elenbass in "Staging a project: Are you setting your project up for success?"
What one says to compatriots in face-to-face conversation is often misunderstood; imagine the possibilities for misunderstandings with someone from halfway around the world, natively speaking another language, and living in a different culture! In such circumstances how can you be sure that your collocutor has understood you in face-to-face (hard), telephone (harder), and email (hardest) conversations? Without being fully present in the conversation -- mindfully aware -- whether it's face-to-face, by Skype or phone, or through email, successful communication is difficult, even more so for intercultural communication.
The ubiquity of English facilitates basic communication, but its use as a common language frequently disguises cultural differences. Furthermore, to say that English (or any other language) can be ambiguous, is an understatement. But regardless of language, clear communication is essential for success in any collaborative undertaking whether done by a small co-located group or by a globally dispersed team.
This tutorial teaches mindful communication and describes frameworks useful in understanding cultural differences and gives real-life examples of misunderstandings due to such differences. Expect to take away practical tools to understand your own cultural biases and in-class practice mindful communication with your colleagues from other cultures as well as your own. You will also learn about frameworks for understanding other cultures based on work by Geert Hofstede, Fons Trompenaars, and others as well as on the presenter's own experiences.
Here Today, Gone within a Month: The Fleeting Life of Digital NewsFrederick Zarndt
In 1989 on the shores of Montana’s beautiful Flathead Lake, the owners of the weekly newspaper the Bigfork Eagle started TownNews.com to help community newspapers with developing technology. TownNews.com has since evolved into an integrated digital publishing and content management system used by more than 1600 newspaper, broadcast, magazine, and web-native publications in North America. TownNews.com is now headquartered on the banks of the mighty Mississippi river in Moline Illinois.
Not long ago Marc Wilson, CEO of TownNews.com, noticed that of the 220,000+ e-edition pages posted on behalf of its customers at the beginning of the month, 210,000 were deleted by month’s end.
What? The front page story about a local business being sold to an international corporation that I read online September 1 will be gone by September 30? As well as the story about my daughter’s 1st place finish in the district field and track meet?
A 2014 national survey by the Reynolds Journalism Institute (RJI) of 70 digital-only and 406 hybrid (digital and print) newspapers conclusively showed that newspaper publishers also do not maintain archives of the content they produce. RJI found a dismal 12% of the “hybrid” newspapers reported even backing up their digital news content and fully 20% of the “digital-only” newspapers reported that they are backing up none of their content. Educopia Institute’s 2012 and 2015 surveys with newspapers and libraries concur, and further demonstrate that the longstanding partner to the newspaper—the library—likewise is neither collecting nor preserving this digital content.
This leaves us with a bitter irony, that today, one can find stories published prior to 1922 in the Library of Congress’s Chronicling America and other digitized, out-of-copyright newspaper collections but cannot, and never will be able to, read a story published online less than a month ago.
In this paper we look at how much news is published online that is never published in print or on more permanent media. We estimate how much online news is or will soon be forever lost because no one preserves it: not publishers, not libraries, not content management systems, and not the Internet Archive. We delve into some of the reasons why this content is not yet preserved, and we examine the persistent challenges of digital preservation and of digital curation of this content type. We then suggest a pathway forward, via some initial steps that journalists, producers, legislators, libraries, distributors, and readers may each take to begin to rectify this historical loss going forward.
Here Today, Gone within a Month: The Fleeting Life of Digital NewsFrederick Zarndt
This document discusses the fleeting lifespan of digital news content. It notes that TownNews.com, which hosts digital content for over 1600 publications, found that 210,000 of the 220,000 digital news pages from the beginning of a month were deleted by the end of the month. Surveys have shown that few digital news producers actively preserve their content, with only 12% of hybrid print-digital newspapers backing up content and 20% of digital-only newspapers backing no content up at all. As a result, much recent digital news content is lost to researchers and the historical record. The document examines challenges to preserving born-digital news and suggests stakeholders like journalists, legislators, libraries and readers should take initial steps to address this problem
An international survey of born digital legal deposit policies and practices ...Frederick Zarndt
That news publication has changed dramatically since the advent of the Internet and the Web is no news to anyone. There are many examples of established news organizations that have either stopped printing newspapers or shifted to publishing news on websites or through social media such as Facebook and Twitter. There are even more examples of new news organizations that have never printed news on paper and are digital only.
To the authors’ knowledge, every country has one or more legal deposit organizations tasked with preserving news for future generations. Legal deposit laws in some countries have been amended to include news that may never be instantiated on paper (born digital news). However, legal deposit laws are by no means universally amended and, even when such amendments have been made, their embodiment in practice varies widely.
As a follow-on to the paper Missing links: The digital news preservation discontinuity (http://www.ifla.org/node/8933) presented in August 2014 at IFLA News Media section satellite conference at the ITU Library in Geneva, Switzerland, the authors have surveyed cultural heritage organizations (libraries) around the world about their respective national born digital legal deposit policies and practices. We share the survey results and consider the ramifications of inadequate born digital news preservation policies and practice to future generations.
An international survey of born digital legal deposit policies and practices ...Frederick Zarndt
That news publication has changed dramatically since the advent of the Internet and the Web is no news to anyone. There are many examples of established news organizations that have either stopped printing newspapers or shifted to publishing news on websites or through social media such as Facebook and Twitter. There are even more examples of new news organizations that have never printed news on paper and are digital only.
To the authors’ knowledge, every country has one or more legal deposit organizations tasked with preserving news for future generations. Legal deposit laws in some countries have been amended to include news that may never be instantiated on paper (born digital news). However, legal deposit laws are by no means universally amended and, even when such amendments have been made, their embodiment in practice varies widely.
As a follow-on to the paper Missing links: The digital news preservation discontinuity (http://www.ifla.org/node/8933) presented in August 2014 at IFLA News Media section satellite conference at the ITU Library in Geneva, Switzerland, the authors have surveyed cultural heritage organizations (libraries) around the world about their respective national born digital legal deposit policies and practices. We share the survey results and consider the ramifications of inadequate born digital news preservation policies and practice to future generations.
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...Frederick Zarndt
In all of its many flavors, crowdsourcing works. It works for cultural heritage organizations too. During this presentation we look at various aspects of crowdsourced OCR text correction, commenting, and tagging for digitized historical newspapers at the National Library of Australia’s Trove, the California Digital Newspaper Collection (CDNC), and at the Cambridge Public Library in Cambridge Massachusetts as well as the astounding number of historical birth, death, marriage, census, and other records transcribed by “crowd” volunteers at Family Search. Some aspects include: demographics, experiences, motivation, quality, preferred data, economics and marketing. You will see that crowd sourcing is not only feasible but also practical and desirable. You will wonder why your own cultural heritage organization hasn't begun its own crowdsourcing project!
20131019 digital collections - if you build them will anyone visit [library 2...Frederick Zarndt
This document discusses digital historical newspaper collections in libraries and their visibility on the internet. It finds that while libraries spend significant resources digitizing collections, the collections receive little internet traffic and have poor search engine results. Some key points made include:
- Historical newspaper collections are among the most used collections in libraries with digital texts but receive low percentages of overall website traffic.
- Searching sample collections for information on the Gallipoli campaign yields few or no results from the library collections in the first 100 Google/Google News search results.
- Simple changes like adding XML sitemaps and adjusting robots.txt files can significantly increase search engine indexing and traffic for digital collections.
20130903 what did you say? interculture communication [hamburg]Frederick Zarndt
This document discusses intercultural communication and misunderstandings. It provides quotes and principles about the importance of effective communication to build understanding between people from different cultures and avoid assumptions. It notes that a lack of communication or poor communication can lead to more assumptions and misunderstandings.
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...Frederick Zarndt
This document discusses digital newspaper collections held by various libraries and the challenges of getting these collections discovered through search engines. It provides data on the size and date ranges of different newspaper collections, then analyzes the results of searching for articles on the Gallipoli Campaign from 1915-1916 across the collections and major search engines. It finds that none of the library collection articles show up in the first 100 search results, despite holding relevant articles, and attributes this to libraries not optimizing their websites and collections for search engines through things like sitemaps and robots.txt files. The document advocates for libraries to spend more on promoting, presenting and optimizing their digital collections for search visibility.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
2. 2
Abstract
Many library digital text collections created from pre-digital era materials (books, journals,
magazines, etc) and nearly all library digital historical newspaper collections use digital
mages (TIFF, JPEG, JPEG2000, etc), OCR software (ABBYY FineReader, Nuance
Omnipage), METS XML, ALTO XML, and various metadata standards (MODS, Dublin Core,
PRISM, MIX, etc). Both METS (Metadata Encoding and Transmission Standard) and ALTO
(Analyzed Layout and Text Object) are XML standards developed by the international library
community and administered (hosted) by the Library of Congress at
http://www.loc.gov/standards/mets/ and http://www.loc.gov/standards/alto/ respectively.
The current editorial board has members from the National Library of Finland, the British
Library, Singapore National Library Board, Bibliothèque nationale de France, Netherlands
Koninklijke Bibliotheek, the Library of Congress, the University of Kentucky, the University
of California Riverside, and a software company, Content Conversion Specialists. All but
two are IFLA members, and several serve on other standards boards in addition to the ALTO
board. (You can see the list of current editorial board members at
http://www.loc.gov/standards/alto/community/editorialboard.html.)
With members in cities that span 16 time zones, you can imagine collaboration, cooperation,
and good communication are essential to achieving anything. Of course a willingness of the
members in the outlying time zones to get up early or stay up late is indispensable too. Good
telecommunications infrastructure is imperative, and, as we will see, free and easy (Skype)
sometimes is not reliable.
This paper gives an account of the history of the ALTO XML standard, of the ALTO editorial
board, and of the ways that the board organizes itself and conducts its business. The paper
describes the collaborative process used by the board to receive, review, and adopt changes
to the standard, and it gives especial attention to step-by-step process collaboratively
developed to track and implement changes to ALTO. And last, but far from least, it
informally examines members’ motivations for participation in the board.
Keywords: library standards, XML standards, ALTO XML, OCR, digitization
1. Overview
Analysed Layout and Text Object (ALTO) standard is a XML schema of metadata for
describing the layout and content of physical text resources such as pages of a book or a
newspaper. ALTO accurately captures technical details of text pages such as the position of
characters, words, paragraphs, illustations, footnotes, etc. These details make it possible for
access systems like Chronicling America, Papers Past, Trove, and NewspapersSG (and many
others) to precisely locate and show a character, word, paragraph, illustration, or footnote on
a page image.
ALTO is an open standard and may be freely used by anyone.
3. 3
The Metadata Encoding and Transmission Standard (METS) is also a library XML standard
often used in conjunction with ALTO. While METS XML1
can represent the structure of a
variety of digital objects (text, video, audio), it cannot, nor is it intended to, describe text
components (paragraphs, words, characters, illustrations, etc) of a text digital object such as a
book, magazine, or newspaper. The ALTO XML standard has been developed for this
purpose. And while ALTO XML files are primarily intended for use with METS XML files,
they may also be used independently.
The remainder of this paper will recount the history of the ALTO schema, the history of
ALTO editorial board, the ALTO administration and maintenance process, and the operation
of the ALTO editorial board. For those who are interested, ALTO technical details may be
found at http://www.loc.gov/standards/alto/.
2. A Brief History of ALTO XML
METAe was a 3-year EU-funded research and development project which began in
September 2000. The project was a collaboration of 14 partners from 7 European countries
and the USA and coordinated by the University of Innsbruck2
. The ALTO XML standard is
one of the products of the METAe project.
One of the METAe partners, Content Conversion Specialists (CCS), administered and
maintained the ALTO standard until August 2009. Then responsibility for its administration
and maintentance was transferred to the Library of Congress and the ALTO Editorial Board.
The version of the ALTO schema was changed to 2.0 as part of the transfer and is identical to
the previous version (1.4) except for the addition of a loc.gov namespace URI and an updated
URI import reference for the inclusion of xLink functionality.
One of the first uses of ALTO for a mass digitization project began in 2004 during the inital
phase of the Library of Congress’s National Digital Newspaper Program(NDNP)3
. ALTO is
1
METS has been and continues to be developed by the METS Editorial Board whose members are drawn from
the international library community. The Library of Congress administers and maintains the METS standard,
schema, and documentation. See http://www.loc.gov/standards/mets/.
2
METAe project members were Leopold-Franzens-Universität, Institut für Angewandte Informatik Universität
Linz, Mitcom Neue Medien GmbH, CCS Content Conversion Specialists GmbH, Universidad de Alicante,
Friedrich-Ebert-Stiftung, Cornell University Library Department of Preservation and Conservation,
Bibliothèque nationale de France, The National Library of Norway, Biblioteca Statale A. Baldini, Dipartimento
di Sistemi e Informatica University of Florence, Universitätsbibliothek Karl-Franzens-Universität, Scuola
Normale Superiore Centro di Ricerche Informatiche per i Beni Culturali, and Higher Education Digitisation
Service HEDS. Also see http://meta-e.aib.uni-linz.ac.at/.
3
A detailed description of the National Digital Newspaper Program (NDNP) and the technical requirements are
found at http://www.loc.gov/ndnp/. Chronicling America (http://chroniclingamerica.loc.gov/) is the access
software system for NDNP data.
4. 4
an essential component of NDNP because it facilitates the capture of text reading order and
word position on each OCRed newspaper page. ALTO makes it possible for NDNP access
system, Chronicling America (http://chroniclingamerica.loc.gov), to show (highlight) a
search term position on a newspaper page. Without this capability users would find it
difficult indeed to locate the search term because newspapers may have 8,000 to 15,000
words on a page.
Chronicling America is one of the first and most prominent users of ALTO XML, but today it
is far from the only one. There are many, many text digitization projects around the world
that use ALTO XML.
3. An Even Briefer History of the ALTO Editorial Board
The ALTO Editorial Board was formed at the same time administration and maintenance of
ALTO was transferred to the Library of Congress. In August 2009 it met for the first time at
the National Library of Sweden in conjunction with an IFLA / National Library of Sweden
sponsored newspapers symposium “The present becomes the past”4
. Initially the board was
comprised of volunteer library professionals with an interest in text digitization, the library
standards liaison from the Library of Congress, and employees of CCS.
The authors of this paper are the current board members. As one can see, board members are
a diverse and cosmopolitan bunch with members from Europe, North America, and
Singapore. Board member locations span 15 time zones from UTC – 7 hours to UTC + 8
hours. Consquently, and some board members would say unfortunately, some members must
get up early for a meeting while others must stay up late.
Since August 2009 the intention of the board has been to meet monthly by teleconference or
web conference. In practice the board meets when a quorum of its volunteer members has
free time from their other responsibilities. Sometimes the meetings are held every other
month (or even less frequently). Very rarely there may be two meetings in a single month.
4
The conference program is found at http://www.kb.se/english/about/news/Present-past/ (accessed July 2013).
The conference was co-sponsored by the IFLA Preservation & Conservation Section, and IFLA Core Activity
on Preservation & Conservation (PAC)
5. 5
4. How the Board Works
The purpose of the ALTO Editorial Board is to maintain editorial control of
ALTO, its XML schema, and official ALTO documentation. Additionally, the
Board promotes the use of the standard and endorses best practices in the use
of ALTO as the practices emerge. The ALTO Editorial Board is representative
of important communities of interest for ALTO.5
The current board has both library and industry members. All board members from libraries
are consumers of ALTO; some library members are also ALTO XML file producers. Industry
members are produceers of ALTO XML (service bureaus) or creators of ALTO production
software (CCS). It’s important that both the producer and consumer viewpoint is represented
because each has a very different perspective, for example, as ALTO consumers, libraries are
far more concerned about backward compatibility.
So far board members are recruited on an ad hoc basis, to replace members who have left the
board, or to fill other needs. In other words, there is no formal recruitment policy.
Recruitment policy is likely to become more formal as the ALTO board matures. Draft board
membership criteria, modeled after the METS Editorial Board membership criteria, are listed
in Appendix 1.
There are no restrictions on who can submit a proposal to change or extend ALTO. Since the
Library of Congress began administration and maintenance of ALTO, no changes have been
made to ALTO, but there are several proposals before the board. These proposals have been
submitted by ALTO board members, by the IMPACT project6
, and by Bibliotheque nationale
de France.
The ALTO board developed an almost self-explanatory proposal submission template (cf.
Appendix 2). The fields most needing explanation are “champion” and “backwards
compatible”.
ALTO board members are volunteers, and, as already mentioned, have full-time work
elsewhere. Asking a board member to carefully study one or two proposals of the dozen or
so proposals before the board is much more doable than asking him/her to study all of them.
Hence, in order to consider proposals efficiently and to reduce the demands on board
members’ time, each proposal is adopted by a board member who volunteers to “champion”
it. The champion studies the proposal, considers its implications for the current ALTO
5
If the statement of purpose sounds suspiciously like the METS statement of purpose, that’s because ALTO’s
statement is patterned after the METS statement (cf. http://www.loc.gov/standards/mets/mets-board).
6
IMPACT, or Improving Access to Text, was an EU-funded project whose objective was to “significantly
improve access to historical text and to take away the barriers that stand in the way of the mass digitisation of
the European cultural heritage” (cf. http://www.impact-project.eu/). The IMPACT project has been succeeded
by the IMPACT Centre of Competence whose goal is to “make the digitisation of historical text better, faster,
cheaper” (cf. http://www.digitisation.eu/).
6. 6
schema, especially on backward compatibility, and, during the course of one or more
meetings, explains to other board members why the proposal ought to be adopted or rejected.
In other words, a proposal’s champion is its advocate, but not the usual sense of the word
advocate. The champion may also advocate for its rejection if the proposal is inappropriate.
Backward compatibility is most important for digital library software which uses ALTO. In
order to use ALTO XML files which incorporate new features from a changed ALTO
schema, the software may very likely have to change. If the new feature “breaks” some
feature in the old schema, then it is necessary to have 2 different code paths to accommodate
both new and old ALTO files7
. “Breaking change” is something that software developers and
digital preservationists strongly prefer to avoid.
Obviously backward compatibility is an important consideration with any schema change. It
is preferable to deprecate old features in favor of an improved new feature since when a
feature is deprecated, files produced with the old and new versions of the schema are
compatible with each other (sort of). Deprecated features may be supported for one or two
future current versions and thereafter completely phased out.
If backward compatibility is preserved in a new ALTO schema, the schema will be released
as a minor versions, for example, version 2.1. If, for some reason, backward compatibility is
broken by a new schema, a major version will be released, for example 3.0.
New schemas will be released as needed but no more than twice in a year and on a schedule
which matches new METS schemas (January and July). A draft schema will be available for
public comment at http://www.loc.gov/standards/alto/ one month prior to its release as a new
major or minor version.
7
It may or may not be possible to transform ALTO files conformant to the old ALTO schema into files
conformant to the new schema.
7. 7
The ALTO Editorial Board has a wiki for meeting agendas, meeting minutes, working
documents, and change proposals (http://altostandard.pbworks.com/). The public is welcome
to join the wiki and to comment on current change proposals. One can request access on the
wiki homepage.
The ALTO board believes that a stylistically uniform schema will facilitate the understanding
and use of ALTO so it has drafted and will soon formally adopt design principles (see
Appendix 3).
5. Meetings
As mentioned above, meetings are mostly by teleconference or web conference.
Occasionally, as work schedules and, even more importantly, as employer budgets allow, the
ALTO board meets face-to-face, and always in conjunction with another library conference
such as the DLF Forum or the IFLA World Library and Information Congress.
By consensus the board meets at 2pm UTC on a Thursday. There has been some attempt to
settle on a particular Thursday of the month, for example, the 1st
or 2nd
Thursday, but member
schedules have proved too variable for this. The date of the next meeting is decided during
the current meeting (preferably) or by email (if needs be) or by Doodle poll (last resort).
One of the board members (currently Frederick) assembles a draft agenda prior to a
scheduled meeting and emails it to board members via the ALTO listserv. The email with the
draft agenda asks other board members for additional agenda items for the next scheduled
meeting. The draft agenda is also posted to the ALTO wiki. The final agenda as well as the
URL to the minutes from the last meeting are emailed to ALTO listserv members a couple
days prior to the scheduled meeting.
The board has tried plain teleconference, Skype, Skype with desktop sharing, and Webex for
its meetings. In principle either Skype or Skype with desktop sharing is attractive because
Skype is so widely used and free or inexpensive (Skype desktop sharing isn’t free). But both
have proven to be extraordinarily unreliable. Fortunately the employer (CCS) of one of the
board members has a Webex subscription. Webex is very reliable, and because it allows
desktop sharing, it also gives the meetings both a visual and an audio channel. In future we
may try Google Hangouts in order to remove our dependency on a fee-based subscription
service.
We recommend that members call from a quiet place or, if this is not possible, to mute their
microphones when they are not speaking. Background noise, like keyboards typing, music
playing, or officemates talking, can make it very difficult to hear and understand what’s being
said.
One of the board members (currently Frederick) moderates the meetings. All members are
encouraged and expected to contribute, sometimes extemporaneously and at other times after
prior preparation. For example, one of the current outstanding action items (see below)
8. 8
which does require preparation outside of a board meeting is to create design principles to
guide future changes to ALTO.
Most meetings produce one or more action items. Each action item is assigned to one or
more board members. An action item does have a start date, the date on which the item was
created, and may have a completion date, for example, “by the next meeting”. But since
board members are volunteers with full-time “other” employment, no one is chastised for
missing a deadline. Spoken or unspoken commendation by one’s fellow board members and
the knowledge that one is being of service to the library community is the only positive
motivation while the only negative motivation is the wish to avoid the embarrassment of
disappointing fellow members.
6. Board Member Motivations
What motivates board members to join and participate? All members are full time employees
of other organizations and presumably have plenty to do for their employer. Board members
are unpaid volunteers and must therefore be intrinsically motivated. According to Wikipedia
…intrinsic motivation refers to motivation that is driven by an interest or
enjoyment in the task itself, and exists within the individual rather than relying
on external pressures or a desire for reward. [People] who are intrinsically
motivated are more likely to engage in the task willingly as well as work to
improve their skills, which will increase their capabilities.8
This is obvious to anyone who has volunteered to do a non-trivial task for a demanding, but
uncompensated, position or to someone who has managed a volunteer organization. It is
nevertheless important to keep in mind both board members’ fulltime work and intrinsic
motivations: Both factors contribute to member’s independent observations, perceptions, and
opinions.
Perhaps the question of motivation is best answered by some members themselves:
The Library of Congress has a strong interest in maintaining library
standards in general, and digital standards are particularly important, give
the ever-changing nature of the medium. I serve on the Board to ensure that
changes keep pace with technology but also retain functionality for the large
body of existing data from years of scanning efforts.
Nate Trail, Library of Congress, Washington DC USA
Bibliotheque nationale de France has used ALTO from the very begining of its
digitalization projects, and it now has millions of ALTO pages available for
preservation and diffusion purposes. ALTO is a great tool used everyday,
8
Wikipedia contributors, "Motivation," Wikipedia, The Free Encyclopedia,
http://en.wikipedia.org/wiki/Motivation (accessed Jul 2013).
9. 9
everywhere. But ALTO also has a future, and the ALTO board is the right
place to build it.
Jean-Philippe Moreux, Bibliothèque nationale de France, Paris France
The Singapore National Library Board (NLB) uses extensively the ALTO
standard for its popular NewspaperSG service. The ALTO editorial board
provides me the opportunity to meet and work with members with substantial
experience with the ALTO standard and implementations.
Kia Siang Hock, Singapore National Library Board, Singapore
The Koninklijke Bibliotheek (KB) began digitizing printed material on a large
scale around 2005. Shortly after that ALTO was chosen and is still used as an
important part of the format the KB has designed for the now many millions of
pages digitized material and growing. In the future we hope that it will also be
possible to improve the quality of the digitized collection, for example, the
quality of the text. For these reasons the KB as well as I are interested in
helping the community to maintain and develop the standard.
Evelien Ket, Koninklijke Bibliotheek, den Haag, the Netherlands
Since 2000 I’ve been creating digitization workflow software and or managing
text digitization projects of all sizes. As an ALTO board member I have the
opportunity to influence the future direction of one of the principle standards
used in text digitization. Besides, if one belongs to a community, one has an
obligation to contribute to it.
Frederick Zarndt, IFLA Newspapers Section, Coronado CA USA
7. Conclusion
As a standard ALTO has had an interesting life, coming out of a joint academic / commercial
project, being maintained for a while by a particular company (CCS), and then returning to a
public, open standard. It has a proven track record and millions of documents are scanned and
expressed using it, ensuring that it will continue for many years. Paired with METS, this
standard ensures that digitized paper documents can be electronically understood with
precision and clarity. The Board is committed to maintain it’s usefulness into the future.
10. 10
Diagram Showing Use of METS and ALTO XML Files to Represent a Text-based Digital
Object such as a Book, Magazine Issue, or Newspaper Issue
11. 11
Appendix 1: ALTO Editorial Board Membership Criteria
1. The ALTO Editorial Board maintains editorial control of ALTO, its XML Schema, and
official ALTO documentation. Additionally, the Board promotes the use of the standard
and endorses best practices in the use of ALTO as they emerge. The ALTO Editorial
Board is representative of important communities of interest for ALTO.
2. Board member criteria:
a. Have significant experience with ALTO implementation or ALTO-related tool
building either previously or currently
b. Represent either currently or previously one or more of the following
constituencies from the international digital library community:
i. National, Academic or Public Libraries
ii. Information services utilities
iii. Governmental agencies or organizations
iv. Vendors supporting digital library operations
c. Demonstrate experience in one or more of the following areas:
i. XML or other information encoding languages
ii. Digital library or digital repository implementations using ALTO
iii. Metadata for digital libraries or repositories such as descriptive,
administrative, structural, or transport schemas
iv. Tool development and /or use for digital information creation, capture,
storage and management, discovery or retrieval
d. Committed support from home institution to support telephone and web
conferences calls, face to face meetings when feasible, in-person or online training
events, and other Board activities
e. Demonstrate ability and interest in developing and fostering the use of ALTO
within digital libraries / repositories and building a strong ALTO community of
implementors
f. Ability to commit to a 3 year term with the possibility of renewal.
3. Expectations for ALTO Board member participation:
a. Makes a serious commitment to meeting the Mission & Objectives of the ALTO
Editorial Board.
b. Participates actively in the work of the Board to maintain and promote the ALTO
schema.
c. Regularly and actively participates in periodic telephone and web conference
calls, as well as face to face meetings when feasible.
d. Prepares for meetings, stays informed about committee and work group activities,
and reviews and comments upon meeting notes, committee and work group
reports, and ALTO communication media as appropriate including the ALTO
listserv and ALTO wiki.
e. Gets to know other Board members and builds collegial relationships among
Board members that contribute to informed and congenial decisionmaking.
f. Exercises professional judgment about changes to ALTO and the impacts of
changes upon current implementations as known by personal experience or by
input from other ALTO implementors.
12. 12
g. Participates in committee or work group activities, educational training events and
other ALTO promotional and fundraising activities.
h. Commits to a 3 year term of appointment with the possibility of renewal.
13. 13
Appendix 2: ALTO XML Change Proposal Template
Champion board member name
Submitter submitter name and email
Submitted YYYY-MM
Status
submitted / discussion / review / accepted | rejected / draft / published
submitted - initial status when proposal is submitted
discussion - proposal is being discussed within the board
review - xsd code is being reviewed
accepted - proposal is accepted
rejected - proposal is rejected
draft - accepted proposal is in public commenting period
published - proposal is published in a schema version
Backwards
compatible?
UNCLEAR / YES / NO
ALTO version version where proposal will be included
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aliquam ac ultricies augue.
Pellentesque consequat interdum nulla, placerat ultricies magna scelerisque nec. Phasellus
eget pellentesque magna. Phasellus dolor leo, vulputate ut tempor sit amet, ultrices eget
turpis. Ut ornare convallis euismod. Aenean convallis elit feugiat augue dapibus pharetra. In
at leo purus. Fusce faucibus iaculis orci, a luctus augue ullamcorper quis. Nulla magna nibh,
elementum ut pellentesque non, fringilla sagittis nulla.
Example
<Glyph ID="P4_ST00001_G02" CONTENT="2">
<Shape>
<Rectangle HPOS="240" VPOS="223" WIDTH="10" HEIGHT="24"/>
</Shape>
<Variance CONFIDENCE="0.5">s</Variance>
<Variance CONFIDENCE="0.1">8</Variance>
</Glyph>
Current Schema ALTO 2.0: Proposed change:
<xsd:complexType
name="processingStepType">
<xsd:annotation>
</xsd:annotation>
<xsd:sequence>
<xsd:complexType
name="processingStepType">
<xsd:annotation>
<xsd:documentation>A processing
step.</xsd:documentation>
</xsd:annotation>
<xsd:sequence>
14. 14
Appendix
3:
ALTO
Schema
Design
Principles
INTRODUCTION
The purpose of these principles is to provide guidance for future development of the ALTO
schema. The document defines naming-rules for elements and attributes so that a coherent
name-style will be used over time.
GENERAL
• Purpose of an element must be unambiguous; Information that are expected to be
stored in an element must be defined clearly as part of the schema documentation.
• Ensure integrity of the data; Information that have the same(!) semantics should only
be stored once in an ALTO file to reduce file sizes and ensure integrity.
o References between XML elements should be established case using ID/IDRef
mechanisms
o XML Elements should be nested to represent a “comprises of” relationship
between real-world objects that are represented by the elements
o Information that qualifies the value of an element should be recorded as an
attribute of the element itself. This may e.g. include encoding information etc.
• The ALTO-Schema should be used as a stand-alone schema and not borrow elements
from other namespace. Therefore every ALTO-document must define the ALTO-
namespace at the root-element level; other name space declarations on embedded
elements are not allowed.
ELEMENTS AND ATTRIBUTES
• Names of XML elements and attributes must only contain ASCII letters.
• Names of XML elements and attributes shouldn’t be longer than 20 characters.
• All names must be Camel-Case.
SCHEMA DESIGN
• Elements (<xsd:element>) should be defined as global element in the schema. Global
elements are elements that are direct descendants of the root element of the schema.
• Global elements should be re-used in the schema instead of defining local elements.
• Global elements must not have different names when they are re-used, except when
they are extended and a new element is derived.
• Cardinality of elements should be expressed explicitly in the schema (using
minOccurs and maxOccurs).
• The ALTO schema should not be modularized
SPECIFIC PRINCIPLES
The current ALTO schema holds two different information objects:
• administrative metadata about the file and its provenance
• full text: the actual full text with layout information as well as provenance
information of the full text itself
15. 15
The use scenarios for both metadata types are different. Administrative metadata is very
rarely being processed. It is just stored with the ALTO file in the repository and usually not
queriable. Unlike the administrative metadata the full text is queriable. It is stored in and
accessed from various systems: retrieval, exchange and render-systems.
Therefore the design requirements changes
Specific requirements for Administrative Metadata
• Mixed content elements (text and child-elements) must be avoided
• The order of elements should be enforced wherever possible (using xml-schema’s
sequence compositor)
Specific requirements for Full Text
• All changes that are made should be backward compatible so that ALTO files that
comply to an old version of the schema will also comply to the new version of the
schema.