The document discusses crowdsourcing and the use of crowdsourced data in analyzing user experiences at the California Digital Newspaper Collection. It provides background on crowdsourcing concepts like the wisdom of crowds and cognitive surplus. It also examines specific crowdsourcing projects and their traffic rankings, including Wikipedia, Galaxy Zoo, and the National Library of Australia's Trove project.
20130321 Putting the world's cultural heritage online with crowdsourcing [roo...Frederick Zarndt
Â
Brief history of crowdsourcing
Crowdsourcing at libraries around the world
Benefits of crowdsourcing
Demographics of library crowdsourcers
How to use various crowdsourcing web apps
The document summarizes a webinar presented by Greg Doyle on the Orbis Cascade Alliance Demand Driven Ebook Initiative. It discusses how the alliance of 37 academic libraries in the Pacific Northwest implemented a demand driven acquisition model with EBL to provide access to ebooks. Key points include how the model works, statistics on usage and spending from 2011-2013, and benefits and challenges of the model. The alliance has expanded the budget to $750,000 for fiscal year 2013 and hopes to increase access to more ebook titles.
Collecting sharing and improving data: changing roles for librarians and user...Rose Holley
Â
- Trove is a digital library platform developed by the National Library of Australia that provides a single search interface to discover digital collections from libraries, archives, and museums.
- It has grown significantly since its launch in 2009 through contributions from over 1,100 organizations and the digitization of over 120 million items.
- Users can tag, comment on, and correct text within the collections, helping to enhance access and discoverability of content through their contributions and interactions.
Building and Managing Online CommunitiesRose Holley
Â
The document discusses the development and management of the Trove online community platform in Australia. It summarizes how Trove began as the Australian Newspapers digitization project in 2007 and expanded in 2010 to become a single discovery service for libraries, archives and museums. It describes how Trove engaged users by allowing them to correct OCR text, add tags and comments, and contributed their own content like photos and videos. Over time, Trove saw increasing user contributions that helped improve and expand the collection.
Books at JSTOR - Book DOIs (2011 CrossRef Workshops)Crossref
Â
JSTOR is expanding its platform to include books in addition to journals. It will launch a books program in June 2012 featuring frontlist and backlist titles from over 20 publishers. Books will be accessible at the chapter-level and integrated with JSTOR's existing journal content. JSTOR aims to enhance discovery of books and drive usage through its established platform. Publishers will provide metadata and register with CrossRef to enable linking between book chapters and cited materials. The program seeks to iteratively develop functionality for eBooks optimized for research and extractive reading.
Library discovery: past, present and some futureslisld
Â
A presentation at the NISO virtual conference on Webscale Discovery Services, 20 November 2013.
Considers some of the issues that have led to the adoption of these services, and some future directions.
Distinguishes between discovery (providing a library destination) and discoverability (making stuff discoverable elsewhere).
This presentation was given at Bobcatsss2013 in Ankara.
Once the library assembled a collection and people came to the library to use it. Now, people build communication, workflows and behaviors around a variety of network resources. The library needs to think about how it is visible and relevant in those workflows and behaviors.
20130321 Putting the world's cultural heritage online with crowdsourcing [roo...Frederick Zarndt
Â
Brief history of crowdsourcing
Crowdsourcing at libraries around the world
Benefits of crowdsourcing
Demographics of library crowdsourcers
How to use various crowdsourcing web apps
The document summarizes a webinar presented by Greg Doyle on the Orbis Cascade Alliance Demand Driven Ebook Initiative. It discusses how the alliance of 37 academic libraries in the Pacific Northwest implemented a demand driven acquisition model with EBL to provide access to ebooks. Key points include how the model works, statistics on usage and spending from 2011-2013, and benefits and challenges of the model. The alliance has expanded the budget to $750,000 for fiscal year 2013 and hopes to increase access to more ebook titles.
Collecting sharing and improving data: changing roles for librarians and user...Rose Holley
Â
- Trove is a digital library platform developed by the National Library of Australia that provides a single search interface to discover digital collections from libraries, archives, and museums.
- It has grown significantly since its launch in 2009 through contributions from over 1,100 organizations and the digitization of over 120 million items.
- Users can tag, comment on, and correct text within the collections, helping to enhance access and discoverability of content through their contributions and interactions.
Building and Managing Online CommunitiesRose Holley
Â
The document discusses the development and management of the Trove online community platform in Australia. It summarizes how Trove began as the Australian Newspapers digitization project in 2007 and expanded in 2010 to become a single discovery service for libraries, archives and museums. It describes how Trove engaged users by allowing them to correct OCR text, add tags and comments, and contributed their own content like photos and videos. Over time, Trove saw increasing user contributions that helped improve and expand the collection.
Books at JSTOR - Book DOIs (2011 CrossRef Workshops)Crossref
Â
JSTOR is expanding its platform to include books in addition to journals. It will launch a books program in June 2012 featuring frontlist and backlist titles from over 20 publishers. Books will be accessible at the chapter-level and integrated with JSTOR's existing journal content. JSTOR aims to enhance discovery of books and drive usage through its established platform. Publishers will provide metadata and register with CrossRef to enable linking between book chapters and cited materials. The program seeks to iteratively develop functionality for eBooks optimized for research and extractive reading.
Library discovery: past, present and some futureslisld
Â
A presentation at the NISO virtual conference on Webscale Discovery Services, 20 November 2013.
Considers some of the issues that have led to the adoption of these services, and some future directions.
Distinguishes between discovery (providing a library destination) and discoverability (making stuff discoverable elsewhere).
This presentation was given at Bobcatsss2013 in Ankara.
Once the library assembled a collection and people came to the library to use it. Now, people build communication, workflows and behaviors around a variety of network resources. The library needs to think about how it is visible and relevant in those workflows and behaviors.
Library collections and the emerging scholarly recordlisld
Â
A high level review of collection trends followed by a summary of recent work on the evolving scholarly record.
Presented at the OCLC Research Library Partnership meeting at the University of Melbourne, 2 December 2015.
Towards collaboration at scale: Libraries, the social and the technicallisld
Â
Libraries are now supporting research and learning behaviors in data rich network environments. This presentation looks at some examples focusing on how an emphasis on individual systems needs to give way to a broader view of process, workflow and behaviors.
It also discusses how this environment creates a demand for collaboration at scale among libraries.
Collection directions - towards collective collectionslisld
Â
How the emergence of new research and learning workflows in digital environments is affecting library collecting and collections. Several trends are reviewed. In the light of diversifying competing requirements, the need to manage down print and develop shared print responses is discussed.
Presentation to OCLC Asia Pacific Regional Council meeting. 13 Oct. 2014.
Rightscaling, engagement, learning: reconfiguring the library for a network e...lisld
Â
1) The document discusses how libraries need to shift from being collection-centric to engagement-based by building new relationships on institutional and network levels.
2) It provides examples of how libraries can improve discovery and access through collaborative initiatives like shared print repositories and developing discovery layers.
3) Libraries are encouraged to explore distinctive engagement services that enhance student experience and research, like curating data assets and measuring researcher impact. This requires reallocating resources away from redundant infrastructure towards new partnerships.
This document discusses rethinking the library services platform (LSP) model to improve interoperability between systems. It notes that while new LSPs have emerged, significant lack of interoperability remains between components of the library technology ecosystem. The author argues that libraries should adopt a platform approach like Windows or Apple, where vendors provide tools and services to allow third parties to build applications on their platforms. This could encourage more applications and make platforms more valuable. Prioritizing the library user perspective may change how libraries think about LSPs. Standards bodies are working on interoperability issues but more remains to be done to fully integrate solutions.
Social metadata for libraries, archives and museums: Research findings from t...Rose Holley
Â
The document summarizes the findings of the RLG Social Metadata Working Group regarding the use of social metadata in libraries, archives, and museums. The working group reviewed 76 relevant websites, surveyed 42 site managers, and developed 18 recommendations. Key recommendations include having clear objectives for social media use, establishing guidelines for staff and user-generated content, preparing staff, and continuously evaluating usability. The full report provides analysis of survey results, case studies of third-party site use, and additional recommendations.
This document summarizes OCLC's WorldCat database and services. WorldCat contains over 203 million bibliographic records and 1.6 billion library holdings. It is growing rapidly, with 57.5% of records in non-English languages. WorldCat.org provides a global library search engine and sees over 16.9 million monthly users. OCLC aims to help libraries share metadata faster and improve discovery at a global scale through WorldCat and related services.
The webinar discussed social reading experiences of sharing bookmarks and annotations in e-books. It covered three main presentations:
1. Todd Carpenter discussed a NISO working group on digital annotation requirements like specifying how annotations are rendered and challenges with annotating text.
2. Rob Sanderson presented on the W3C Open Annotation model for annotating web resources in RDF. It defines a basic model of annotations as comments linked to targets.
3. Dan Whaley discussed building an annotation platform that supports peer review of annotations to build reputation and scale to large numbers of users and annotations.
We used to think of the user in the life of the library. Now we think of the library in the life of the user. As behaviors change in a network environment, we have seen growing interest in ethnographic and user-centered design approaches. This presentation introduces this topic. It also explores changes in how we manage collections as an illustration of this shift towards thinking of the library in the life of the user.
This document discusses what business libraries are in and how they should reposition themselves. It argues that libraries should move away from being centered around physical collections and toward prioritizing user engagement, expertise, services and digital infrastructure. Specifically, it suggests that libraries focus on space that encourages social interaction and knowledge sharing, make their expertise more visible, provide more user-centered services, leverage cloud-based systems, and use data to better support research and learning.
20130630 What motivates library crowdsourcing volunteers? [ALA LITA]Frederick Zarndt
Â
Crowdsourcing volunteers for digital newspaper collections are motivated by interests in history, genealogy and family history research. Volunteers report enjoying learning about the past and helping make historical information more accessible to others. While some volunteers are highly active, correcting hundreds of thousands of lines, most volunteers contribute to a long tail of less frequent but still valuable contributions. Crowdsourcing allows many individuals to collectively contribute to important historical and cultural preservation projects.
20130123 Crowdsourcing [hamilton library u of hi]Frederick Zarndt
Â
The document discusses crowdsourcing and provides examples of popular crowdsourcing applications and websites. It defines key concepts in crowdsourcing like citizen science, crowdfunding, and crowdlearning. Examples provided include Galaxy Zoo, Kickstarter, Duolingo, and Mechanical Turk. Traffic and usage statistics from Alexa are presented for several crowdsourcing sites to illustrate their popularity and reach on the internet.
Community Generated Databases for NY State History Conference 2013Larry Naukam
Â
This document discusses community generated databases (CGDBs) which utilize volunteers outside of traditional organizations to create searchable historical records and collections. It provides examples of the Church Records Preservation Committee, New York Heritage, and Viewshare projects. CGDBs make collections more accessible and useful by indexing, transcription, and digitization done by community volunteers. Standards and training are important to ensure quality. CGDBs can unlock underutilized collections and engage new audiences through volunteer contributions.
The document discusses several resource discovery tools that can be used to search for scholarly materials across different types of content. It provides information on tools such as Google Scholar, EBSCO Discovery Service, ProQuest, SirsiDynix, Scopus, and WorldCat. Each tool is summarized, outlining its key features and functions in allowing users to discover resources for research and learning.
How can UK academic libraries respond to the current issues in scholarly publ...Stuart Dempster
Â
Trends in publishing and collections development, and some opportunities for UK academic libraries to transform services to meet institutional and user requirements in a fast changing environment.
20120821 putting the worldâs cultural heritage online with crowd sourcing sli...Frederick Zarndt
Â
The document discusses using crowdsourcing to put cultural heritage collections online. It describes how crowdsourcing is being used by various cultural institutions, such as the National Library of Australia, to transcribe text from digitized historical newspapers and documents. Crowdsourcing provides economic and other benefits, including improving the accuracy of optical character recognition, increasing the discoverability of collection items through better searchability, and engaging more of the public with cultural heritage collections online.
Leslie Johnston Keynote, Best Practices Exchange 2011lljohnston
Â
This document discusses how digital collections are now considered data rather than just records or content. It notes that researchers want to analyze entire collections as data sets rather than individual records. Large digital collections like web archives, historic newspapers, and Twitter archives contain billions of records that researchers want to query, analyze, and visualize as data. Institutions are collaborating through groups like the National Digital Stewardship Alliance and developing open source tools like ViewShare to support access to and preservation of these "big data" collections.
Presented by Peter Burnhill and Lisa Otty at 36th Annual IATUL Conference in Hannover, Germany, 5 - 9 July 2015 âStrategic Partnerships for Access and Discoveryâ
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012lljohnston
Â
Big Data challenges in developing repositories include:
- Collections like web archives and historic newspapers contain billions of files and grow quickly, requiring constant processing and large-scale infrastructure.
- Researchers want to analyze entire collections using algorithms and computational methods rather than accessing individual items.
- Repository services need to support self-serve access, full-text search of entire collections, and APIs to enable computational research methods.
- Ingesting and providing access to collections measured in petabytes and containing highly diverse content and metadata requires normalization and standardization.
Hello islandora building a digital repository nov 30, 2016 v6eohallor
Â
Hosted at The New York Academy of Medicine on November 30, 2016.
Morning Session: Developing Islandora Digital Collections (Panel)
This panel discussion will explore multiple uses and implementations of Islandora, an open source digital repository framework. Panelists will describe their digital projects, how Islandora was utilized and their overall experience.
Afternoon Session: Islandora Demonstration (Hands-on)
Islandora is an OAIS adherent and open source digital repository framework. It combines the Drupal CMS and Fedora Commons repository software, together with additional open source applications, the framework delivers a wide range of functionality out of the box.
This Islandora demonstration will provide users with an overview of how to ingest content, configure the discovery layer and restrict access to content.
Library collections and the emerging scholarly recordlisld
Â
A high level review of collection trends followed by a summary of recent work on the evolving scholarly record.
Presented at the OCLC Research Library Partnership meeting at the University of Melbourne, 2 December 2015.
Towards collaboration at scale: Libraries, the social and the technicallisld
Â
Libraries are now supporting research and learning behaviors in data rich network environments. This presentation looks at some examples focusing on how an emphasis on individual systems needs to give way to a broader view of process, workflow and behaviors.
It also discusses how this environment creates a demand for collaboration at scale among libraries.
Collection directions - towards collective collectionslisld
Â
How the emergence of new research and learning workflows in digital environments is affecting library collecting and collections. Several trends are reviewed. In the light of diversifying competing requirements, the need to manage down print and develop shared print responses is discussed.
Presentation to OCLC Asia Pacific Regional Council meeting. 13 Oct. 2014.
Rightscaling, engagement, learning: reconfiguring the library for a network e...lisld
Â
1) The document discusses how libraries need to shift from being collection-centric to engagement-based by building new relationships on institutional and network levels.
2) It provides examples of how libraries can improve discovery and access through collaborative initiatives like shared print repositories and developing discovery layers.
3) Libraries are encouraged to explore distinctive engagement services that enhance student experience and research, like curating data assets and measuring researcher impact. This requires reallocating resources away from redundant infrastructure towards new partnerships.
This document discusses rethinking the library services platform (LSP) model to improve interoperability between systems. It notes that while new LSPs have emerged, significant lack of interoperability remains between components of the library technology ecosystem. The author argues that libraries should adopt a platform approach like Windows or Apple, where vendors provide tools and services to allow third parties to build applications on their platforms. This could encourage more applications and make platforms more valuable. Prioritizing the library user perspective may change how libraries think about LSPs. Standards bodies are working on interoperability issues but more remains to be done to fully integrate solutions.
Social metadata for libraries, archives and museums: Research findings from t...Rose Holley
Â
The document summarizes the findings of the RLG Social Metadata Working Group regarding the use of social metadata in libraries, archives, and museums. The working group reviewed 76 relevant websites, surveyed 42 site managers, and developed 18 recommendations. Key recommendations include having clear objectives for social media use, establishing guidelines for staff and user-generated content, preparing staff, and continuously evaluating usability. The full report provides analysis of survey results, case studies of third-party site use, and additional recommendations.
This document summarizes OCLC's WorldCat database and services. WorldCat contains over 203 million bibliographic records and 1.6 billion library holdings. It is growing rapidly, with 57.5% of records in non-English languages. WorldCat.org provides a global library search engine and sees over 16.9 million monthly users. OCLC aims to help libraries share metadata faster and improve discovery at a global scale through WorldCat and related services.
The webinar discussed social reading experiences of sharing bookmarks and annotations in e-books. It covered three main presentations:
1. Todd Carpenter discussed a NISO working group on digital annotation requirements like specifying how annotations are rendered and challenges with annotating text.
2. Rob Sanderson presented on the W3C Open Annotation model for annotating web resources in RDF. It defines a basic model of annotations as comments linked to targets.
3. Dan Whaley discussed building an annotation platform that supports peer review of annotations to build reputation and scale to large numbers of users and annotations.
We used to think of the user in the life of the library. Now we think of the library in the life of the user. As behaviors change in a network environment, we have seen growing interest in ethnographic and user-centered design approaches. This presentation introduces this topic. It also explores changes in how we manage collections as an illustration of this shift towards thinking of the library in the life of the user.
This document discusses what business libraries are in and how they should reposition themselves. It argues that libraries should move away from being centered around physical collections and toward prioritizing user engagement, expertise, services and digital infrastructure. Specifically, it suggests that libraries focus on space that encourages social interaction and knowledge sharing, make their expertise more visible, provide more user-centered services, leverage cloud-based systems, and use data to better support research and learning.
20130630 What motivates library crowdsourcing volunteers? [ALA LITA]Frederick Zarndt
Â
Crowdsourcing volunteers for digital newspaper collections are motivated by interests in history, genealogy and family history research. Volunteers report enjoying learning about the past and helping make historical information more accessible to others. While some volunteers are highly active, correcting hundreds of thousands of lines, most volunteers contribute to a long tail of less frequent but still valuable contributions. Crowdsourcing allows many individuals to collectively contribute to important historical and cultural preservation projects.
20130123 Crowdsourcing [hamilton library u of hi]Frederick Zarndt
Â
The document discusses crowdsourcing and provides examples of popular crowdsourcing applications and websites. It defines key concepts in crowdsourcing like citizen science, crowdfunding, and crowdlearning. Examples provided include Galaxy Zoo, Kickstarter, Duolingo, and Mechanical Turk. Traffic and usage statistics from Alexa are presented for several crowdsourcing sites to illustrate their popularity and reach on the internet.
Community Generated Databases for NY State History Conference 2013Larry Naukam
Â
This document discusses community generated databases (CGDBs) which utilize volunteers outside of traditional organizations to create searchable historical records and collections. It provides examples of the Church Records Preservation Committee, New York Heritage, and Viewshare projects. CGDBs make collections more accessible and useful by indexing, transcription, and digitization done by community volunteers. Standards and training are important to ensure quality. CGDBs can unlock underutilized collections and engage new audiences through volunteer contributions.
The document discusses several resource discovery tools that can be used to search for scholarly materials across different types of content. It provides information on tools such as Google Scholar, EBSCO Discovery Service, ProQuest, SirsiDynix, Scopus, and WorldCat. Each tool is summarized, outlining its key features and functions in allowing users to discover resources for research and learning.
How can UK academic libraries respond to the current issues in scholarly publ...Stuart Dempster
Â
Trends in publishing and collections development, and some opportunities for UK academic libraries to transform services to meet institutional and user requirements in a fast changing environment.
20120821 putting the worldâs cultural heritage online with crowd sourcing sli...Frederick Zarndt
Â
The document discusses using crowdsourcing to put cultural heritage collections online. It describes how crowdsourcing is being used by various cultural institutions, such as the National Library of Australia, to transcribe text from digitized historical newspapers and documents. Crowdsourcing provides economic and other benefits, including improving the accuracy of optical character recognition, increasing the discoverability of collection items through better searchability, and engaging more of the public with cultural heritage collections online.
Leslie Johnston Keynote, Best Practices Exchange 2011lljohnston
Â
This document discusses how digital collections are now considered data rather than just records or content. It notes that researchers want to analyze entire collections as data sets rather than individual records. Large digital collections like web archives, historic newspapers, and Twitter archives contain billions of records that researchers want to query, analyze, and visualize as data. Institutions are collaborating through groups like the National Digital Stewardship Alliance and developing open source tools like ViewShare to support access to and preservation of these "big data" collections.
Presented by Peter Burnhill and Lisa Otty at 36th Annual IATUL Conference in Hannover, Germany, 5 - 9 July 2015 âStrategic Partnerships for Access and Discoveryâ
Leslie Johnston: Library Big Data Repository Services, Open Repositories 2012lljohnston
Â
Big Data challenges in developing repositories include:
- Collections like web archives and historic newspapers contain billions of files and grow quickly, requiring constant processing and large-scale infrastructure.
- Researchers want to analyze entire collections using algorithms and computational methods rather than accessing individual items.
- Repository services need to support self-serve access, full-text search of entire collections, and APIs to enable computational research methods.
- Ingesting and providing access to collections measured in petabytes and containing highly diverse content and metadata requires normalization and standardization.
Hello islandora building a digital repository nov 30, 2016 v6eohallor
Â
Hosted at The New York Academy of Medicine on November 30, 2016.
Morning Session: Developing Islandora Digital Collections (Panel)
This panel discussion will explore multiple uses and implementations of Islandora, an open source digital repository framework. Panelists will describe their digital projects, how Islandora was utilized and their overall experience.
Afternoon Session: Islandora Demonstration (Hands-on)
Islandora is an OAIS adherent and open source digital repository framework. It combines the Drupal CMS and Fedora Commons repository software, together with additional open source applications, the framework delivers a wide range of functionality out of the box.
This Islandora demonstration will provide users with an overview of how to ingest content, configure the discovery layer and restrict access to content.
The document discusses how ESIP (Earth Science Information Partners) uses virtual communities and platforms to facilitate collaboration around earth science data. It provides examples of how ESIP creates wikis, social media listening tools, and hybrid meeting spaces to allow distributed groups to find, discuss, and share data. These virtual "planks" and "workspaces" help scale collaboration across disciplines and communities. The goal is to support interoperability at multiple levels and allow earth scientists and IT practitioners to produce returns by working together in a connected, social way.
The document discusses various online tools for effective literature management and reference searching. It introduces popular tools like Mendeley, EndNote and Zotero for building local reference databases and sharing references online. Social bookmarking and networking sites like Diigo, SlideShare and Wikipedia are also covered that allow searching references through tags and connecting with other users.
Wiser Pku Lecture@Life Science School Pkuguest8ed46d
Â
The document discusses various online tools for effective literature management and reference searching. It introduces popular tools like Mendeley, EndNote and Zotero for building local reference databases and sharing references online. Social bookmarking and networking sites like Diigo, SlideShare and Wikipedia are described as useful resources for searching references in a social way through tags and user connections.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC
Â
Rachel Frick, OCLC Executive Director of the OCLC Research Library Partnership, reviews some of the broad agenda items and recent publications related to the work of OCLC Research. Rachel is then joined for two presentations on specific research topics. First, Sharon Streams (OCLC Director of WebJunction) and Monika Sengul-Jones (OCLC Wikipedian-in-Residence) present on âPublic Libraries and Wikipedia.â Next, Kenning Arlitsch (Dean, Montana State University Library) and Jeff Mixter (OCLC Senior Software Engineer) share their findings on âAccurate Institutional Repository Download Measurement using RAMP, the Repository Analytics and Metrics Portal.â
Web-Scale Discovery: Post ImplementationRachel Vacek
Â
Discovery services provide users a single
search box to access a libraryâs entire prei-ndexed collection. Representatives from
two academic libraries serving different
user populations will discuss marketing,
instructing users, evaluating the product,
and maintaining the resource after a
discovery service is implemented
Dennis Massie, OCLC, USA Come for the free analysis, stay for the community...CTLes
Â
The document discusses the potential for the OCLC ILL Cost Calculator to serve as a new platform for the interlending community to share and analyze cost data. It describes lessons learned from other data projects like the IFLA Library Map of the World and Google Street View. While gathering comprehensive usage and cost data presents challenges given data sensitivity and availability issues, starting with a basic calculator and allowing for incremental contributions could help address these obstacles. The goal is to help users understand their own costs over time and compare to peers to inform resource sharing strategies.
This document discusses environmental scanning and Library 2.0. It defines environmental scanning as communicating external information that may influence organizational decision making. Key components of an effective scan include top-level support, objectives, methodology, communication of results, and action planning. Characteristics of effective scanning teams include seeing beyond status quo and having a big picture view. Library 2.0 utilizes new technologies like blogs, wikis and tagging to create more interactive websites and connect users. It provides options for users and increases access to information.
Digital Transformation and Data - the Wikimedia Residency at the University o...Ewan McAndrew
Â
Digital Transformation and DataâââThe Wikimedia Residency at the University of Edinburgh
This presentation took place at SCURLâs âLibraries, Literacies & Learningâ event 23 March 2018.
Motivational Metrics: A Publisher and Library CollaborationDanea Johnson
Â
In response to increasing internal demand for and focus on metrics associated with the work of the National Academics of Sciences, Engineering, and Medicine, in 2015 the National Academies Press and the Research Center launched an institution wide initiative to standardize our data collection processes and create âbest practiceâ guidelines for staff. This ongoing initiative includes: developing a common definition for impact, a standardized taxonomy for data collection, development work on the internal metrics platform, staff training on data analysis, and marketing tools for reports to sponsors and new funding proposals. MacDonald and Willis will present on how their metrics initiative has impacted the work of librarians, researchers, and program staff, what they have learned about the increasing importance of metrics in scholarly publishing, and what they have in mind for the future.
Alphonse MacDonald is currently Acting Co-Executive Director of the National Academies Press. He has more than 20 years experience in the digital media and publishing sectors and has developed electronic publishing and online outreach programs for a broad range of publishers and non-profit organizations including Island Press, Conservation International, and the National Academies of Sciences, Engineering, and Medicine.
Colleen Willis is currently the Manager of the Research Center at the National Academies of Sciences, Engineering and Medicine. She manages a staff of 4 research librarians who support the Academies with concept development, project proposals, conducting research, report writing and measuring impact.
Similar to 20121105 no tempest in my teapot [dlf forum denver] (20)
Digitization of the Tuol Sleng Genocide Museum ArchivesFrederick Zarndt
Â
This document provides a summary of a report on a project to preserve, digitize, index and host archives from the Tuol Sleng Genocide Museum in Cambodia. The project aims to spread an objective vision of history by digitizing over 400,000 pages of materials related to the Khmer Rouge regime and Tuol Sleng prison. Key aspects of the project include training museum staff, digitizing the materials to high standards, creating searchable databases and indexes, and developing a public-facing website with crowd-sourcing capabilities to engage the Cambodian people. Challenges include the fragile materials, limited local skills and equipment, and ensuring the work is done to completion within a tight timeline and budget.
2017 Born Digital Legal Deposit Policies and PracticesFrederick Zarndt
Â
This document summarizes the key details and findings of a survey conducted in 2014 and 2017 on born digital legal deposit policies and practices. The 2014 survey was sent to 20 national libraries and received responses from 17 libraries. It found that legal deposit laws varied widely, with Nordic countries leading in digital content capture while many others made no provision for digital. Only 7 countries addressed deposit of born-digital content. To update the survey, the authors expanded their team in 2017 and broadened the survey reach. The document reviews 17 previous related surveys from 2005-2016 on topics like audiovisual preservation, e-legal deposit, web archiving, and digital news preservation. It provides context on the goals and questions of each prior survey.
In 2015, three of the authors (Zarndt, McCain, Carner) surveyed the born digital content legal deposit policies and practices in 18 different countries and presented the results of the survey at the 2015 International News Media Conference hosted by the National Library of Sweden in Stockholm, Sweden, April 2015.
As a first step, the authors reviewed previous surveys about legal deposit and digital preservation. The authors updated and streamlined the 2015 survey in order to assess progress in creating or improving national policies and in implementing practices for preserving born digital content. The current survey consists of as many as 20 questions; which questions are asked depends on the respondentâs previous answers.
More than 50 countries and states in Australia, Germany and USA, participated in the survey. The survey closed at the end of November 2017. The authors expect to repeat the survey periodically in order to assess progress in developing born digital legal policy and implementing the policy in practice.
What did you say? interculture communication [20160308 phnom penh]Frederick Zarndt
Â
The single biggest problem in communication is the illusion it has taken place. George Bernard Shaw, Irish playwright, co-founder of London School of Economics, and Nobel Prize in Literature (1925).
Projects are about communication, communication, and communication. B. Elenbass in "Staging a project: Are you setting your project up for success?"
What one says to compatriots in face-to-face conversation is often misunderstood; imagine the possibilities for misunderstandings with someone from halfway around the world, natively speaking another language, and living in a different culture! In such circumstances how can you be sure that your collocutor has understood you in face-to-face (hard), telephone (harder), and email (hardest) conversations? Without being fully present in the conversation -- mindfully aware -- whether it's face-to-face, by Skype or phone, or through email, successful communication is difficult, even more so for intercultural communication.
The ubiquity of English facilitates basic communication, but its use as a common language frequently disguises cultural differences. Furthermore, to say that English (or any other language) can be ambiguous, is an understatement. But regardless of language, clear communication is essential for success in any collaborative undertaking whether done by a small co-located group or by a globally dispersed team.
This tutorial teaches mindful communication and describes frameworks useful in understanding cultural differences and gives real-life examples of misunderstandings due to such differences. Expect to take away practical tools to understand your own cultural biases and in-class practice mindful communication with your colleagues from other cultures as well as your own. You will also learn about frameworks for understanding other cultures based on work by Geert Hofstede, Fons Trompenaars, and others as well as on the presenter's own experiences.
Coronado public library digital newspapers workshop local partnerships [oct 2...Frederick Zarndt
Â
Using digitized historical newspapers for genealogical research
Brian Geiger, California Digital Newspaper Collection
Frederick Zarndt, IFLA Governing Board
1. Introductory remarks: Who we are; focus on freely available collections and especially those that allow researchers to create accounts; numerous sites they can pay to access but we wonât spend much time on them
2. Only small percentage of surviving newspapers have been digitized
3. How newspapers are digitized. Focusing especially on OCR, if itâs not OCRâed well itâs not discoverable
4. How Coronado newspapers were digitized. CDNCâs work with the public library, Coronado Public Libraryâs work with the publisher, the process of scanning the film and processing the images, etc.
5. Free vs. Pay. 2 kinds of digitized newspaper archives: 1) publicly funded and available for free, 2) commercial sites you pay to access. Dozens or even hundreds of public sites, from small institutional to national.
6. Google wonât always get you what you want
7. Basic search using Elephind: What elephind is. Search âAbraham Lincolnâ and explain what they see. Described âfacetsâ
8. CDNC advanced search
9. Collecting What You Find: Right-click features in the CDNC
10. Collecting What You Find: CDNC user accounts
11. Interacting with Content: CDNC
12. Interacting with Content: Tagging and commenting in CDNC
Coronado public library digital newspapers workshop [Oct 2016]Frederick Zarndt
Â
Using digitized historical newspapers for genealogical research
Brian Geiger, California Digital Newspaper Collection
Frederick Zarndt, IFLA Governing Board
1. Introductory remarks: Who we are; focus on freely available collections and especially those that allow researchers to create accounts; numerous sites they can pay to access but we wonât spend much time on them
2. Only small percentage of surviving newspapers have been digitized
3. How newspapers are digitized. Focusing especially on OCR, if itâs not OCRâed well itâs not discoverable
4. How Coronado newspapers were digitized. CDNCâs work with the public library, Coronado Public Libraryâs work with the publisher, the process of scanning the film and processing the images, etc.
5. Free vs. Pay. 2 kinds of digitized newspaper archives: 1) publicly funded and available for free, 2) commercial sites you pay to access. Dozens or even hundreds of public sites, from small institutional to national.
6. Google wonât always get you what you want
7. Basic search using Elephind: What elephind is. Search âAbraham Lincolnâ and explain what they see. Described âfacetsâ
8. CDNC advanced search
9. Collecting What You Find: Right-click features in the CDNC
10. Collecting What You Find: CDNC user accounts
11. Interacting with Content: CDNC
12. Interacting with Content: Tagging and commenting in CDNC
What did you say? mindful interculture communication [201608 icgse]Frederick Zarndt
Â
The single biggest problem in communication is the illusion it has taken place. George Bernard Shaw, Irish playwright, co-founder of London School of Economics, and Nobel Prize in Literature (1925).
Projects are about communication, communication, and communication. B. Elenbass in "Staging a project: Are you setting your project up for success?"
What one says to compatriots in face-to-face conversation is often misunderstood; imagine the possibilities for misunderstandings with someone from halfway around the world, natively speaking another language, and living in a different culture! In such circumstances how can you be sure that your collocutor has understood you in face-to-face (hard), telephone (harder), and email (hardest) conversations? Without being fully present in the conversation -- mindfully aware -- whether it's face-to-face, by Skype or phone, or through email, successful communication is difficult, even more so for intercultural communication.
The ubiquity of English facilitates basic communication, but its use as a common language frequently disguises cultural differences. Furthermore, to say that English (or any other language) can be ambiguous, is an understatement. But regardless of language, clear communication is essential for success in any collaborative undertaking whether done by a small co-located group or by a globally dispersed team.
This tutorial teaches mindful communication and describes frameworks useful in understanding cultural differences and gives real-life examples of misunderstandings due to such differences. Expect to take away practical tools to understand your own cultural biases and in-class practice mindful communication with your colleagues from other cultures as well as your own. You will also learn about frameworks for understanding other cultures based on work by Geert Hofstede, Fons Trompenaars, and others as well as on the presenter's own experiences.
Here Today, Gone within a Month: The Fleeting Life of Digital NewsFrederick Zarndt
Â
In 1989 on the shores of Montanaâs beautiful Flathead Lake, the owners of the weekly newspaper the Bigfork Eagle started TownNews.com to help community newspapers with developing technology. Â TownNews.com has since evolved into an integrated digital publishing and content management system used by more than 1600 newspaper, broadcast, magazine, and web-native publications in North America. Â TownNews.com is now headquartered on the banks of the mighty Mississippi river in Moline Illinois.
Not long ago Marc Wilson, CEO of TownNews.com, noticed that of the 220,000+ e-edition pages posted on behalf of its customers at the beginning of the month, 210,000 were deleted by monthâs end.
What? Â The front page story about a local business being sold to an international corporation that I read online September 1 will be gone by September 30? As well as the story about my daughterâs 1st place finish in the district field and track meet?
A 2014 national survey by the Reynolds Journalism Institute (RJI) of 70 digital-only and 406 hybrid (digital and print) newspapers conclusively showed that newspaper publishers also do not maintain archives of the content they produce. RJI found a dismal 12% of the âhybridâ newspapers reported even backing up their digital news content and fully 20% of the âdigital-onlyâ newspapers reported that they are backing up none of their content. Educopia Instituteâs 2012 and 2015 surveys with newspapers and libraries concur, and further demonstrate that the longstanding partner to the newspaperâthe libraryâlikewise is neither collecting nor preserving this digital content.
This leaves us with a bitter irony, that today, one can find stories published prior to 1922 in the Library of Congressâs Chronicling America and other digitized, out-of-copyright newspaper collections but cannot, and never will be able to, read a story published online less than a month ago.
In this paper we look at how much news is published online that is never published in print or on more permanent media. Â We estimate how much online news is or will soon be forever lost because no one preserves it: not publishers, not libraries, not content management systems, and not the Internet Archive. Â We delve into some of the reasons why this content is not yet preserved, and we examine the persistent challenges of digital preservation and of digital curation of this content type. We then suggest a pathway forward, via some initial steps that journalists, producers, legislators, libraries, distributors, and readers may each take to begin to rectify this historical loss going forward.
Here Today, Gone within a Month: The Fleeting Life of Digital NewsFrederick Zarndt
Â
This document discusses the fleeting lifespan of digital news content. It notes that TownNews.com, which hosts digital content for over 1600 publications, found that 210,000 of the 220,000 digital news pages from the beginning of a month were deleted by the end of the month. Surveys have shown that few digital news producers actively preserve their content, with only 12% of hybrid print-digital newspapers backing up content and 20% of digital-only newspapers backing no content up at all. As a result, much recent digital news content is lost to researchers and the historical record. The document examines challenges to preserving born-digital news and suggests stakeholders like journalists, legislators, libraries and readers should take initial steps to address this problem
An international survey of born digital legal deposit policies and practices ...Frederick Zarndt
Â
That news publication has changed dramatically since the advent of the Internet and the Web is no news to anyone. There are many examples of established news organizations that have either stopped printing newspapers or shifted to publishing news on websites or through social media such as Facebook and Twitter. There are even more examples of new news organizations that have never printed news on paper and are digital only.
To the authorsâ knowledge, every country has one or more legal deposit organizations tasked with preserving news for future generations. Legal deposit laws in some countries have been amended to include news that may never be instantiated on paper (born digital news). However, legal deposit laws are by no means universally amended and, even when such amendments have been made, their embodiment in practice varies widely.
As a follow-on to the paper Missing links: The digital news preservation discontinuity (http://www.ifla.org/node/8933) presented in August 2014 at IFLA News Media section satellite conference at the ITU Library in Geneva, Switzerland, the authors have surveyed cultural heritage organizations (libraries) around the world about their respective national born digital legal deposit policies and practices. We share the survey results and consider the ramifications of inadequate born digital news preservation policies and practice to future generations.
An international survey of born digital legal deposit policies and practices ...Frederick Zarndt
Â
That news publication has changed dramatically since the advent of the Internet and the Web is no news to anyone. There are many examples of established news organizations that have either stopped printing newspapers or shifted to publishing news on websites or through social media such as Facebook and Twitter. There are even more examples of new news organizations that have never printed news on paper and are digital only.
To the authorsâ knowledge, every country has one or more legal deposit organizations tasked with preserving news for future generations. Legal deposit laws in some countries have been amended to include news that may never be instantiated on paper (born digital news). However, legal deposit laws are by no means universally amended and, even when such amendments have been made, their embodiment in practice varies widely.
As a follow-on to the paper Missing links: The digital news preservation discontinuity (http://www.ifla.org/node/8933) presented in August 2014 at IFLA News Media section satellite conference at the ITU Library in Geneva, Switzerland, the authors have surveyed cultural heritage organizations (libraries) around the world about their respective national born digital legal deposit policies and practices. We share the survey results and consider the ramifications of inadequate born digital news preservation policies and practice to future generations.
20140628 crowdsourcing, family history, and long tails for libraries [ala ann...Frederick Zarndt
Â
In all of its many flavors, crowdsourcing works. It works for cultural heritage organizations too. During this presentation we look at various aspects of crowdsourced OCR text correction, commenting, and tagging for digitized historical newspapers at the National Library of Australiaâs Trove, the California Digital Newspaper Collection (CDNC), and at the Cambridge Public Library in Cambridge Massachusetts as well as the astounding number of historical birth, death, marriage, census, and other records transcribed by âcrowdâ volunteers at Family Search. Some aspects include: demographics, experiences, motivation, quality, preferred data, economics and marketing. You will see that crowd sourcing is not only feasible but also practical and desirable. You will wonder why your own cultural heritage organization hasn't begun its own crowdsourcing project!
20131019 digital collections - if you build them will anyone visit [library 2...Frederick Zarndt
Â
This document discusses digital historical newspaper collections in libraries and their visibility on the internet. It finds that while libraries spend significant resources digitizing collections, the collections receive little internet traffic and have poor search engine results. Some key points made include:
- Historical newspaper collections are among the most used collections in libraries with digital texts but receive low percentages of overall website traffic.
- Searching sample collections for information on the Gallipoli campaign yields few or no results from the library collections in the first 100 Google/Google News search results.
- Simple changes like adding XML sitemaps and adjusting robots.txt files can significantly increase search engine indexing and traffic for digital collections.
20130903 what did you say? interculture communication [hamburg]Frederick Zarndt
Â
This document discusses intercultural communication and misunderstandings. It provides quotes and principles about the importance of effective communication to build understanding between people from different cultures and avoid assumptions. It notes that a lack of communication or poor communication can lead to more assumptions and misunderstandings.
201308 wlic standards committee zarndt et al the alto editorial board collabo...Frederick Zarndt
Â
The document provides information about the ALTO Editorial Board, which maintains the ALTO (Analyzed Layout and Text Object) XML standard for describing page layout and content metadata. The board was established in 2009 and is comprised of members from libraries and organizations around the world. The board aims to promote ALTO usage and ensure the standard evolves to meet emerging needs. Meeting agendas, procedures, and examples of board members' motivations for participation are presented.
201308 wlic standards committee zarndt et al the alto editorial board collabo...Frederick Zarndt
Â
The document summarizes the history and operations of the ALTO Editorial Board. It describes ALTO as an XML standard for describing text layout in digitized documents. The board has an international membership representing major libraries. It meets regularly to review proposals to update the ALTO standard. The board follows a standardized process for submitting and reviewing proposals, with a designated member championing each proposal. It aims to balance functionality improvements with backward compatibility for digital library systems using ALTO.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Â
Are you ready to revolutionize how you handle data? Join us for a webinar where weâll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, weâll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sourcesâfrom PDF floorplans to web pagesâusing FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether itâs populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
Weâll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Ivantiâs Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There weâll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Â
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
Â
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This yearâs report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Â
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
Â
An English đŹđ§ translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech đ¨đż version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Project Management Semester Long Project - Acuityjpupo2018
Â
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
Â
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
Â
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power gridâs behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Â
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Â
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Â
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Donât worry, we can help with all of this!
Weâll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. Weâll provide examples and solutions for those as well. And naturally weâll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Â
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
Â
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Â
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind fĂźr viele in der HCL-Community seit letztem Jahr ein heiĂes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und LizenzgebĂźhren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer mĂśglich. Das verstehen wir und wir mĂśchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lÜsen kÜnnen, die dazu fßhren kÜnnen, dass mehr Benutzer gezählt werden als nÜtig, und wie Sie ßberflßssige oder ungenutzte Konten identifizieren und entfernen kÜnnen, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnÜtigen Ausgaben fßhren kÜnnen, z. B. wenn ein Personendokument anstelle eines Mail-Ins fßr geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren LÜsungen. Und natßrlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Ăberblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und ĂźberflĂźssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps fßr häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
Â
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di piÚ di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilitĂ , standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunitĂ open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. à stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Â
20121105 no tempest in my teapot [dlf forum denver]
1. No tempest in my teapot:
Analysis of Crowdsourced Data
and User Experiences at the
California Digital Newspaper
Collection
Brian Geiger
Director, Center for Bibliographical Studies and Research
California Digital Newspaper Collection
Frederick Zarndt
Chair, IFLA Newspapers Section
Photo held by John Oxley Library, State Library of Queensland. Original from
Courier-mail, Brisbane, Queensland, Australia.
3. The Wisdom of Crowds
In 2004 James Surowiecki published âThe Wisdom
of Crowds: Why the Many Are Smarter Than the
Few and How Collective Wisdom Shapes Business,
Economies, Societies and Nationsâ. In it he asserts
a crowd of persons that are diverse,
independent, and decentralized usually make
better judgements or decisions than single
persons
5. A Google advanced search for
âcrowdsourcingâ from 1-Jun-2006, the date
of publication of Jeff Howeâs Wired magazine
article, to 1-Jun-2007 gives 44,600 hits.
A date range of 1-Jun-2011 to 1-Jun-2012 gives
2,680,000 hits.
Searches used the Internet Archivesâ Wayback Machine
6. Crowdsourcing is a process that
involves outsourcing tasks to a distributed
group of people. ... the difference between
crowdsourcing and ordinary outsourcing is
that a task or problem is outsourced to an
undefined public rather than a specific
body, such as paid employees.
Wikipedia contributors, "Crowdsourcing," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Crowdsourcing
(accessed June 1, 2012)
7. Crowdsourcing is a type of participative online activity in
which an individual, an institution, a non-profit
organization, or company proposes to a group of individuals
of varying knowledge, heterogeneity, and number, via a
flexible open call, the voluntary undertaking of a task. The
undertaking of the task, of variable complexity and
modularity, and in which the crowd should participate
bringing their work, money, knowledge and/or experience,
always entails mutual benefit. The user will receive the
satisfaction of a given type of need, be it economic, social
recognition, self-esteem, or the development of individual
skills, while the crowdsourcer will obtain and utilize to their
advantage that what the user has brought to the venture,
whose form will depend on the type of activity undertaken.
Enrique EstellĂŠs-Arolas and Fernando GonzĂĄlez-LadrĂłn-de-Guevara. Towards an integrated crowdsourcing definition.
Journal of Information Science XX(X). 2012. pp. 1-14.
8. crowdcollaboration crowd*
crowdsourcing
ng
di
citizen science
un
df
ow
cr
crowdcasting crowdvoting
9. what is Alexa?
⢠Alexa collects and analyzes Internet data for purposes of web analytics. Web analytics is
the measurement, collection, analysis and reporting of Internet data for the purposes of
understanding and optimizing web usage. Alexa is now a subsidiary of Amazon.
⢠Alexa was founded in 1996 by Brewster Kahle (Internet Archive) and Bruce Gilliat.
⢠Alexa operations includes archiving of webpages as they are crawled. This database served
as the basis for the creation of the Internet Archive accessible through the Wayback
Machine.
⢠Alexa continually crawls all publicly-available websites to create a series of snapshots of
the web.
⢠Alexa gathers information from a variety of sources to provide key statistics about each
site on the web, for example, Traffic Rank, the number of PageViews, and site Speed,
Bounce Rate, etc. This information is derived from Alexa toolbar users (~6,000,000
worldwide).
10. definitions
⢠A PageView is a request for a file whose type is defined as a page.
⢠A Unique Visitor is a uniquely identified client generating requests on the
web server or viewing pages within a defined time period (i.e. day, week or
month). A Unique Visitor counts once within the timescale.
⢠A Visit is a series of page requests from the same uniquely identified client
with a time of no more than 30 minutes between each page request.
⢠Bounce Rate is the percentage of visits where the visitor enters and exits at
the same page without visiting any other pages on the site in between.
⢠World | Country Rank is a function of the average daily unique visits and
the number of unique pages requested.
definitions adapted from Wikipedia http://en.wikipedia.org/wiki/Web_analytics
11. crowdsourcing
Amazon Mechanical Turk was launched Nov 2005
Alexa global rank of Amazon Mechanical Turk (13-Jun-2012): 6,022
13. crowdvoting
Iowa Electronic Market was 1st
launched in 1995
Alexa global traffic rank of Iowa
Electronic Market (6-Aug-2012):
11,290
Alexa US traffic rank of Iowa
Electronic Market (6-Aug-2012):
3,923
14. citizen science
Galaxy Zoo was 1st launched July 2007
Alexa global traffic rank of Galaxy Zoo (13-Jun-2012): 557,766
15. crowdfunding
Kickstarter was 1st launched in 2008
Alexa global traffic rank of Kickstarter (6-Aug-2012): 752
27,528 projects successfully funded with more than USD $254,000,000
17. Wikipedia
⢠Began 2001
⢠Now in 285 languages
⢠3,900,000+ articles in English, 1,400,000+ in German, 1,250,000+ in
French, 1,050,000 in Dutch
⢠40 wikipedia languages with more than 100,000 articles
⢠112 wikipedia languages with more than 10,000 articles
⢠400,000,000 unique visitors per month
⢠85,000 active contributors
⢠Alexa global traffic rank: #6 in worldwide web traffic
18.
19. Family Search Indexing was 1st launched (beta) 2004
Alexa global / country traffic rank of FamilySearch (13-Jun-2012): 4,352 / 1,357
20. ⢠Started (beta) 2004
⢠More than 780,000 worldwide registered volunteers
from ~25 countries index records relevant to family
history
⢠Approximately 100,000 active volunteers each month
⢠UI in Chinese, English, German, French, Italian,
Japanese, Korean, Portuguese, and Russian
⢠Blind double-key entry with arbitration / reconciliation
⢠More than 1,500,088,741 records indexed (July 2012)
⢠Accuracy typically > 99.95%
21. Project Gutenberg was 1st launched Dec 1971
Alexa global traffic rank of Project Gutenberg (13-Jun-2012): 5,744
22. ⢠Started Dec 1971
⢠Worldwide volunteers transcribe or proofread OCRâd
public domain books through Distributed Proofreaders
⢠40,000 books completed (July 2012)
⢠Partner / affiliated projects for Australia, Canada,
Europe, Germany, Luxembourg, Philippines, Runeberg
(Nordic literature), Russia, Taiwan
23. Alexa global / country traffic rank of National Library of Australia (31-Oct-2012): 15,519 / 406
Trove gets ~72% of all National Library web traffic.
24. National Library of
Australia
⢠Online since 2008
⢠7,200,000+ pages
⢠Top text corrector 1,250,000 lines (June 2012)
⢠2,450,000+ lines corrected each month (average
for 1st 6 months 2012)
⢠68,908,757 lines corrected as of July 2012, up
from 42,411,468 lines corrected July 2011.
⢠63,613 total registered users (July 2012)
⢠4,146 active users (June 2012)
25. Alexa global / country traffic rank of National Library of Finland
2,535,854 (31-Oct-2012) / 199 (2-Apr-2012)
26. National Library of
Finland
⢠Digitalkoot is a project to improve OCR text in
digitized newspapers -- by playing games!
⢠Digitalkoot is a collaboration between the National
Library and Microtask
⢠Players correct OCR text by playing Myyräsillassa
(Mole Bridge) or Myyräjahdissa (Mole Hunt)
⢠National Library has 4,000,000+ digitized pages
⢠109,321 registered players (October 2012)
⢠Since February 2011 8,024,530 micro-tasks have
been completed
27. Alexa global / country traffic rank of UC Riverside (31-Oct-2012): 12,439 / 4,717
CDNC gets ~1.84% of all UC Riverside web traffic.
28. California Digital
Newspaper Collection
⢠CDNC began digitizing newspapers in 2005 as
part of NDNP
⢠Newspapers digitized to article-level as well as
to page-level as required by NDNP
⢠Hosted on Veridian beginning 2009
⢠Collection size 55,970 issues, 495,175 pages,
5,658,224 articles, 498,000,000+ lines
29. OCR text correction
⢠OCR text correction added August 2011
⢠Corrections are done line by line
⢠~578,000+ lines of text corrected (Oct 2012)
⢠~1.1% of the collection corrected, 98.9% to go!
⢠Top corrector 243,000 lines > 2x 2nd corrector
31. uncorrected OCR accuracy by
newspaper title
OCR character ~OCR word
Title
accuracy accuracy*
PRP Pacific Rural Press 1871 - 1922 92.6% 68.1%
SFC San Francisco Call 1890 - 1913 92.6% 68.1%
LAH Los Angeles Herald 1873 - 1910 88.7% 54.9%
LH Livermore Herald 1877 - 1899 88.6% 54.6%
DAC Daily Alta California 1841 - 1891 88.2% 53.4%
CFJ California Farmer and Journal
86.5% 48.4%
of Useful Sciences 1855 - 1880
SN Sausalito News 1885 - 1922 70.4% 17.3%
*Word accuracy assumes average word length is 5 characters
32. OCR accuracy by newspaper title
OCR character Corrected
Title
accuracy accuracy
PRP Pacific Rural Press 1871 - 1922 92.6% 99.3%
SFC San Francisco Call 1890 - 1913 92.6% 99.6%
LAH Los Angeles Herald 1873 - 1910 88.7% 99.1%
LH Livermore Herald 1877 - 1899 88.6% 99.9%
DAC Daily Alta California 1841 - 1891 88.2% 99.9%
CFJ California Farmer and Journal
86.5% 99.8%
of Useful Sciences 1855 - 1880
SN Sausalito News 1885 - 1922 70.4% 100.0%
33. corrected accuracy by
newspaper title
OCR character ~OCR word Corrected ~Corrected
Title
accuracy accuracy* accuracy word accuracy*
PRP 1871 - 1922 92.6% 68.1% 99.3% 96.5%
SFC 1890 - 1913 92.6% 68.1% 99.6% 98.0%
LAH 1873 - 1910 88.7% 54.9% 99.1% 95.6%
LH 1877 - 1899 88.6% 54.6% 99.9% 99.5%
DAC 1841 - 1891 88.2% 53.4% 99.9% 99.5%
CF 1855 - 1880 86.5% 48.4% 98.3% 91.8%
SN 1885 - 1922 70.4% 17.3% 100.0% 100.0%
*Word accuracy assumes average word length is 5 characters
34. correction accuracy
by user
Average OCR Correction
User
accuracy accuracy
A 70.4% 100.0%
B 87.1% 99.5%
C 95.4% 99.5%
D 86.5% 98.3%
E 95.3% 100.0%
F 91.0% 100.0%
G 91.0% 99.8%
H 90.5% 99.0%
I 96.6% 99.8%
J 94.8% 100.0%
K 86.8% 99.3%
35. the long of crowdsourced tail *
OCR text correction
a probability distribution has a long tail if a larger
share of population rests within its tail than it would
under a normal distribution
the most productive users represent a small fraction
of the total user population and ~50% of total
production, or, said a different way, the largest
fraction but individually not quite so productive
users are as important as the most productive users
The phrase âlong tailâ was popularized by Chris Anderson in the October 2004 Wired magazine article The Long Tail
and by Clay Shirkyâs February 2003 essay âPower laws, web logs, and inequalityâ.
36. OCR text correction long tails
3,000,000
2,250,000
50%
300000
top corrector 242,965 1,500,000 top corrector 1,456,906
225000
50% 750,000
150000 50%
0
75000 NLA lines corrected by text corector
50%
0
CDNC lines corrected by text corrector
37. Motivation
Graphic from Kaufmann et al. âMore than fun and money. Worker Motivation
in Crowdsourcing â A Study on Mechanical Turk.â
38. Wisdom of crowds
Each person should have private information
Diversity even if it's just an eccentric interpretation of the
known facts.
People's opinions aren't determined by the
Independence
opinions of those around them.
People are able to specialize and draw on local
Decentralization
knowledge.
Some mechanism exists for turning private
Aggregation
judgments into a collective decision.
James Surowiecki, The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective
Wisdom Shapes Business, Economies, Societies and Nations, Anchor Books, New York, 2005.
39. Cognitive surplus
... people are learning to use their free time for creative
activities rather than consumptive ones [such as watching
TV] ...
... the total human cognitive effort in creating all of
Wikipedia in every language is about one hundred million
hours ...
... Americans alone watch two hundred billion hours of TV
every year, or enough time, if it would be devoted to projects
similar to Wikipedia, to create about 2000 of them ...
Clay Shirky. Cognitive surplus: Creativity and generosity in a connected age. Penguin Press. New York. 2010.
40. Motivation
Genealogists and family historians
⢠National Library of Australiaâs 2012 Trove
status report showed that ~50% of Trove users
are family historians
PAPERSPAST ⢠National Library of New Zealand survey found
that ~50% of PapersPast users are genealogists
⢠California Digital Newspaper Collection spring
2012 survey discovered that ~70% of its users
are genealogists; 75% are 50 years old or older
⢠A Utah Digital Newspapers survey showed that
72% of its users are genealogists
41. Motivation
Trove usersâ report
⢠âI enjoy the correction - itâs a great way to learn more
about past history and things of interest whilst doing a
âservice to the communityâ by correcting text for the benefit
of others.â
⢠âI have recently retired from IT and thought that I could be
of some assistance to the project. It benefits me and other
people. It helps with family research.â
From Rose Holley in âMany Hands Make Light Work.â National Library of Australia March 2009.
42. Motivation
CDNC usersâ report
âI am interested in all kinds of history. I have pursued genealogy
as a hobby for many years. I correct text at CDNC because I see
it as a constructive way to contribute to a worthwhile project.
Because I am interested in history, I enjoy it.â
Wesley, California
Personal communications with CDNC text correctors.
43. Motivation
CDNC usersâ report
âI only correct the text on articles of local interest - nothing at
state, national or international level, no advertisements, etc.Â
The objective is to be able to help researchers to locate local
people, places, organizations and events using the on-line
search at CDNC. I correct local news & gossip, personal items,
real estate transactions, superior court proceedings, county and
local board of supervisors meetings, obituaries, birth notices,
marriages, yachting news, etc.â
Ann, California
Personal communications with CDNC text correctors.
44. Motivation
CDNC usersâ report
âI am correcting text for the Coronado Tent City Program for
1903. It is important to correct any problems with personal
names and other information so that researchers will be able
to search by keyword and be assured of retrieving desired
results. ... type fonts cause a great deal of difficulty in
digitizing the text and can cause problems for searchers. Also,
many of the guests' names at Tent City and Hotel Del
Coronado were taken from the registration books and reported
in the Program. This led to many problems in spelling of last
names and the editors were not careful to be consistent in the
spellings. This Program is an important resource since it
provides an excellent picture of daily life in Tent City and
captures much of the history of Coronado itself.â
Gene, California
Personal communications with CDNC text correctors.
45. Motivation
CDNC usersâ report
âI have always been interested in history, especially the
development of the American West, and nothing brings it alive
better than newspapers of the time. I believe them to be an
invaluable source of knowledge for us and future generations.â
David, United Kingdom
Personal communications with CDNC text correctors.
46. Motivation
CDNC usersâ report
CDNC is an excellent source of information matching my
personal interest in such topics as sea history, development
of shipbuilding, clippers and other ships etc. ...
Unfortunately, the quality of text ... is rather poor Iâm
afraid. This is why I started to do all corrections necessary
for myself ... and to leave the corrected text for use of
others. .... I am not doing this very regularly as this is just
my hobby and pleasure.
Jerzey, Poland
Personal communications with CDNC text correctors.
48. Website traffic
After a crowdsourcing transcription project of diaries from the
American War Between the States, Nicole Saylor, Head of Digital
Library Services at the University of Iowa Libraries, reported
âOn June 9, 2011, we went from about 1000
daily hits to our digital library on a really good
day to more than 70,000.â
Nicole Saylor interviewed by Trevor Owens. âCrowdsourcing the Civil War: Insights Interview with Nicole Saylorâ blog post
at http://blogs.loc.gov/digitalpreservation/2011/12/crowdsourcing-the-civil-war-insights-interview-with-nicole-saylor/.
Dec 6, 2011.
49. Website traffic
Website traffic at CDNC before / after implementing
crowdsourcing
before crowdsourcing after crowdsourcing
change
11-Jun-2011 / 12-Jul-2011 11-Jun-2012 / 12-Jul-2012
visits 17,485 21,488 +22.9%
unique visitors 11,381 13,376 +17.5%
visit duration 9m 24s 11m 7s +18.3%
bounce rate 51.3% 44.5% -6.8%
pages per visit 14.9 11.7 -21.5%
51. Crowdsourcing
benefits
Public domain photo courtesy of US Navy
52. $
Economics
Financial value of outsourced OCR text correction
for newspapers?
The Assumptions
⢠25 to 50 characters per line in a newspaper column:
Assume 40 characters per line (CDNC sample average)
⢠Outsourced text transcription or correction costs USD
$0.35 to $1.20 per 1000 characters: Assume $0.50
per 1000 characters
53. $
Economics
$ 578,000 lines x 40 characters per line x
1/1000 x $0.50 = $11,560
$ 68,908,757 lines x 40 characters per line x
1/1000 x $0.50 = $1,378,175
54. $
Economics
Financial value of in-house OCR text
correction?
The Assumptions
⢠Correction takes 15 seconds per line
⢠Cost is hourly wage plus benefits of lowest level
employee, $10 for CDNC, $41.88* for Australia
AUD $40.38 = USD $41.88 is the actual labor value assumed by the National Library of Australia to calculate avoided costs
due to crowdsourced OCR text correction in its 2012 Trove Status Report.
55. $
Economics
$ 578,000 lines x 15 seconds per line x 1/3600 hrs
per second x $10.00 per hr = $24,083
$ 68,908,757 lines x 15 seconds per line x 1/3600
hrs per second x $41.88 per hr = $12,024,578
56. Accuracy
âHis Accuracy Depends on Ours!"
Office for Emergency Management. Office of War
Information. Domestic Operations Branch. Bureau of
Special Services. [Photo held at US National Archives and
Records Administration]
57. Accuracy
⢠Edwin Kiljin (Koninklijke Bibliotheek the Netherlands)
reports raw OCR character accuracies of 68% for early 20th
century newspapers
⢠Rose Holley (National Library of Australia) reports raw
OCR character accuracy varied from 71% to 98% on a
sample Trove digitized newspapers
Edwin Kiljin. âThe current state-of-art in newspaper digitization.â D-Lib Magazine. January/February 2008.
Rose Holley. âHow good can it get? Analysing and improving OCR accuracy in large scale historic newspaper digitisation
programs. D-Lib Magazine. March/April 2009.
Public domain graphic courtesy of Wikimedia Commons.
58. Accuracy
Mapping texts* assesses digitization quality of digital
newspapers by comparing the number of words
recognized to the total number of words scanned
*Mapping texts is a collaboration between the University of North Texas and Stanford University aimed at experimenting
with new methods for finding and analyzing meaningful patterns embedded in massive collections of digital newspapers.
59. Accuracy
How does low text accuracy affect search recall?
The Facts
⢠Average uncorrected OCR character accuracy of the
CDNC sample data is ~89%
⢠Average length of an English word is 5 characters
⢠Average word accuracy is 89% x 89% x 89% x 89% x
89% = 55.8% - round up to 60% or 6 out of 10 words
correct
Public domain graphic courtesy of Wikimedia Commons.
60. Search recall no text correction
ARNDT
ARNDT ARNDT
ARNDT ARNDT
ARNDT
ARNDT ARNDT
ARNDT
ARNDT
instances of âARNDTâ found instances of âARNDTâ not found
61. Accuracy
The Facts
⢠Average corrected character accuracy of the CDNC
sample data is ~99.4%
⢠Average word accuracy of CDNC corrected text is
99.4% x 99.4% x 99.4% x 99.4% x 99.4% = 97.0%
Public domain graphic courtesy of Wikimedia Commons.
62. Search recall with text correction
ARNDT
ARNDT ARNDT
ARNDT ARNDT
ARNDT
ARNDT
ARNDT ARNDT
ARNDT
instances of âARNDTâ found instances of âARNDTâ not found
63. Accuracy
A search for âArndtâ at Chronicling America
gives 10,267 results*
⢠If Chronicling America text accuracy is 55.8% (same
as uncorrected CDNC sample), then 8,133 instances
of âArndtâ were not found
⢠If text accuracy is 97.0%, then 317 instances of
âArndtâ were not found
* Search performed 31 Oct 2012
Alexa global / country traffic rank of Library of Congress (31-Oct-2012): 4,056 / 1,317
Chronicling America gets ~7.1% of all Library of Congress web traffic.
Public domain graphic courtesy of Wikimedia Commons.
64. Hard-to-measure-but-
shouldnât-be-overlooked
benefits
Public domain photo âA useful instruction for young sailors from the Royal Hospital
School, Greenwichâ from the National Maritime Museum.
65. HTMBSBO benefit
âwhen someone transcribes a document, they are
actually better fulfilling the mission of a cultural
heritage organization than someone who simply stops
by to flip through the pagesâ
Paraphrased from Trevor Owenâs Crowdstorming blog http://crowdstorming.wordpress.com/
66. HTMBSBO benefit
âin addition to increasing search accuracy or lowering
the costs of document transcription, crowdsourcing is
the single greatest advancement in getting people using
and interacting with library collectionsâ
Paraphrased from Trevor Owenâs Crowdstorming blog http://crowdstorming.wordpress.com/
67. Crowdsourcing considerations
⢠How to market / advertise
crowdsourcing?
⢠How to motivate
crowdsourcers?
⢠Is authentication / identity of
crowdsourcers an issue?
⢠How to administer
crowdsourced data?
Photo of Aleister Crowley [Public domain] from Wikimedia
Commons
68. Conclusions
⢠Lots of crowdsourcing in cultural heritage
organizations and elsewhere
⢠Benefits are multi-faceted: Economic, data
accuracy, patron engagement, increased web
traffic
Conclusion of the Sonata for piano #32, opus 111 by
Ludwig van Beethoven
69. Try crowdsourcing!
Correct California newspapers text
http://cdnc.ucr.edu
Correct Australian newspapers text
http://trove.nla.gov.au
Correct Cambridge MA newspapers text
http://bit.ly/cambridgepublic
Correct Russian language periodicals
http://bit.ly/russianperiodicals
Others soon to follow: Library of Virginia, University of Tennessee,
National Library of Singapore, ...
70. ?
Brian Geiger
bgeiger@ucr.edu
Frederick Zarndt
frederick@frederickzarndt.com
Photo held by John Oxley Library, State Library of Queensland. Original from
Courier-mail, Brisbane, Queensland, Australia.