This document summarizes a report on a public collaborative text correction project for digitized Australian historic newspapers. It discusses allowing users to tag, comment on, and correct optical character recognition errors in newspaper articles. Top contributors were able to correct over 100,000 articles. User feedback was positive and they requested more advanced searching and profiles. The project shows the power of public collaboration to improve digital collections.
Slides for a talk on "Monitoring the Impact of Your Strategies" given by Brian Kelly, UKOLN at an SCA SEO workshop.
See http://www.ukoln.ac.uk/web-focus/events/workshops/sca-seo-20090629/
This presentation was given on October 15, 2009, as part of the Louisiana State University Libraries Tech Talks Series, facilitated by Digital Technologies Librarian Rebecca Miller.
Introduction To Facebook: Opportunities and Challenges For The Institutionlisbk
Slides used in a talk on "Introduction To Facebook: Opportunities and Challenges For The Institution" given by Brian Kelly, UKOLN at a meeting held at the University of Bath on 29 August 2007.
See http://www.ukoln.ac.uk/web-focus/events/meetings/bath-facebook-2007-08/
Slides for a talk on "Monitoring the Impact of Your Strategies" given by Brian Kelly, UKOLN at an SCA SEO workshop.
See http://www.ukoln.ac.uk/web-focus/events/workshops/sca-seo-20090629/
This presentation was given on October 15, 2009, as part of the Louisiana State University Libraries Tech Talks Series, facilitated by Digital Technologies Librarian Rebecca Miller.
Introduction To Facebook: Opportunities and Challenges For The Institutionlisbk
Slides used in a talk on "Introduction To Facebook: Opportunities and Challenges For The Institution" given by Brian Kelly, UKOLN at a meeting held at the University of Bath on 29 August 2007.
See http://www.ukoln.ac.uk/web-focus/events/meetings/bath-facebook-2007-08/
Enhancement and Enrichment of Digital Content by User Communities: The Aust...guest6a9161
Presentation by Rose Holley, Manager - Australian Newspapers Digitisation Program to the Innovative Ideas Forum held at the National Library of Australia 27 March 2009
God søk er essentielt for et godt intranett. Likevel investeres det hverken i nødvendig teknologi eller kompetanseutvikling på søk. Resultatet er skremmende: dobbeltarbeid, dårlige beslutninger, forsinkelser og overskridelser, kaste bort ansattes tid på leting etter informasjon, treg respons på marked, konkurrenter osv. Med forholdsvis enkle grep kan du gjøre noe med dette i dag.
- Hjelp - intranettet flyter over av innhold
- Sammenhengen mellom søk, informasjon, arkitektur og hyperkoblinger
- Viktigheten av kontekst
- Hva har tillit å gjøre med søk
- Hva med mobilen og søk
- Eksempler på dårlig och god søk
#SPSKC SharePoint Roles And Responsibilities (2010 and Beyond)Shadeed Eleazer
View the webcast of this presentation here: http://vimeo.com/8135334
Session Overview:
SharePoint roles and responsibilities have long been a mystery for many organization. This presentation focuses on team building strategies centered around SharePoint role definition for SharePoint 2010, managing information gaps that exist within SharePoint delivery teams, and the trickle down effect that the Open Government Initiative poses for SharePoint adoption throughout the US.
Task maps. Customer journeys. Cognitive walk-throughs. All are artifacts of our process of seeking understanding about our users that we likely create on a regular basis. But how can we better connect that work to the process of web site data collection and analysis?
Learn how we can adapt our existing process and artifacts to drive the definition of what user data we need to collect, as well as how to better analyze and validate what we do, including:
- Using existing site analytics to set a behavioral baseline.
- Defining what we want to measure based on task maps and other UX artifacts.
The result? Consensus on user behavior as expressed through data that can be used to tell the evolving story about our users and create better products for them.
Assessment 2:
Description/Focus
Essay
Value
50%
Due Date
Midnight Sunday 2 (Week 12)
Length
2500 words
Task: Human services practitioners work across many domains of practice including direct work with individuals, groups and communities.
1. Critically examine the policy or policies that you consider impact upon a client group and suggest ways that policy could be changed to improve the life outcomes for those with whom you are working.
2. Develop a framework that you would adopt for influencing policy change that aligns with your professional values, standards and ethics.
Presentation: The document will be typed in a word document, 12 pt. Font, 1½ or Double spacing
Assessment criteria:
· Critical analysis of social policy
· Application of theory to practice
· Adherence to academic conventions of writing
(eg referencing; writing style)
· At least 8 references. Format APA 6th referencing.
Running head: NETWORK AND WORKFLOW FOR A DATA ANALYTICS COMPANY 1
NETWORK AND WORKFLOW FOR A DATA ANALYTICS COMPANY 2
Network and Workflow for a Data Analytics Company on Ssports
Student Name Nezar Al Massad
Institution Name Dr. Mark O'Connell
Network and Workflow for a Ddata Analytics Company on Ssports.
A company’s network and workflow play a major roles in its performance and growth. Different companies consist of rely on different networks and workflows depending on the services/tasks they are providing and the number of workers and members of staff. A network tends to connect workers and members of staff at different levels of the company. This network tends to create a good and effective workflow within the company, hence a company network and workflow go hand in hand. When creating a network and a workflow of a company, the workers and members of staff working duration must be considered in order to achieve a company objective (Moretti, 2017).Also, the mode of employment which may be permanent or temporary/laying down of workers within a short period of time, to a large extent determines a company’s network and workflow. The change of an organizational requirement due to growth and expansion creates a need for a company to adapt a new network and workflow. A network in company plays a vital role of guiding how the company should run its operations. Comment by Mark O'Connell: Duration?? Comment by Mark O'Connell: What? Laying down?? Comment by Mark O'Connell: OK so stop educating us about the factors that determine a company’s network and tell us about YOUR network Comment by Mark O'Connell: Too obvious
My company in the world requires data analysts for to perform analysisdata analysis allowing them to and make important strategic decisions and identify opportunities in the market, and therefore data analysts are becoming very important vital to our company. Despite this, there are many companies coming u.
OpenAmplify is proud to release Version 2.0 to the world’s first comprehensive semantic platform. Full of new features, but still compatible with V1.1, OpenAmplify 2.0 reflects our commitment to delivering real-world, groundbreaking advances to the Semantic Web community.
As part of our latest release, OpenAmplify version 2.0, we offered a live webinar on January 21. 2010. OpenAmplify CIO Mike Petit led an informative short session about this new release and answered community questions.
UK Department of Education intranet transformation case study at The Intranet...Prescient Digital Media
UK Department of Education intranet transformation case study "Get IT" presented by Erica Hodgson at The Intranet Global Forum 2015 in New York City, Oct 23, 2015.
Metadata Management In A Social Media World, Spsbos, 2 2010Christian Buckley
Presentation given at the Feb 27, 2010 SharePoint Saturday event in Boston (Waltham, MA) by Christian Buckley, Senior Product Manager with echoTechnology. The premise of the presentation is that metadata and taxonomy drive the integration and business utility of social media.
Similar to Many Hands Make Light Work: Public Collaborative Text Correction in Australian Historic Newspapers. Keynote. April 2009 (20)
The strategic rebuilding and positioning of UNSW Canberra Special Collections...Rose Holley
Overview of a four year program of activities and projects to re-establish and position Special Collections (unique distinctive collections) at UNSW Canberra.
Crowdsourcing based curation and user engagement in digital library designRose Holley
Rose Holley, Special Collections Curator at UNSW Canberra discusses the findings of her research into crowdsourcing based curation. Using the digitised historic Australian Newspapers as an example, she looks at how the functionality and interface was developed in close relationship with the users, and how this led on to text correction of newspaper articles. It is nearly ten years since this pioneering project began and the motivations and achievements of the 50,000 volunteers are examined over this time. She questions how successfully the goal of improving text quality and therefore search has been achieved. She proposes that if a similar project was begun now then artificial intelligence software would be used such as OverProof post OCR correction tool to improve the quality of the text. OverProof has been trained on the manual corrections of the Australian newspaper corpus and trials demonstrate it is able to dramatically improve the quality of the corpus. Volunteer text correction could still continue afterwards for difficult text but the software would do the main donkey work, allowing users to have a better quality search.
National Archives of Australia. AVAMS Project Achievements August 2014Rose Holley
An overview of the achievements of the AVAMS project at the National Archives of Australia. The project implemented an audiovisual collection management system and an audiovisual digital preservation system using Mediaflex.
Ideas for how volunteers at cultural heritage institutions can help, using Tr...Rose Holley
Every cultural heritage institution has a large body of willing volunteers. this presentation gives some ideas for how they can usefully help you, using Trove as a tool. The presentation is Art related and was written for the National Gallery of Australia but is equally applicable to museums, libraries and archives.
Finding Information Just Got Easier for Historians. Lachlan Macquarie:200 yea...Rose Holley
Presentation by Rose Holley to historians using Lachlan Macquarie as an example search in Trove. For the Royal Australian Historical Society Conference in October 2010.
Social metadata for libraries, archives and museums: Research findings from t...Rose Holley
The presentative gives research findings from the Research Libraries Group (RLG) on Social Metadata Working Group. The group worked from 2009-2010 researching sites that used social media features before making some recommendations to libraries, archives and museums.
Developments in Access to Art Information: Trove. Presentation at ARLIS confe...Rose Holley
Presentation at ARLIS conference Darwin, September 2010 by Rose Holley. Demonstrates how Trove aggregrates information for Art resources and is a useful tool for researchers, artists and librarians.
Consultation Forum: Music Australia and Trove Transition, September 2010, IAM...Rose Holley
A consultation forum led by Rose Holley and Robyn Holmes for the transition of Music Australia into Trove. Presented at the IAML conference, Brisbane, September 2010
Trove: More Than a Treasure? ALIA Conference Presentation 2010 Brisbane by Ro...Rose Holley
Describes the innovative development of Trove at the National Library of Australia. Trove is a search engine for Australians about Australians. It contains 90 million items from over 1000 contributing organisations.
Trove: A Government 2.0 Showcase August 2010, Australian ParliamentRose Holley
Presentation covers the aspects of Trove which make it a Government 2.0 showcase example. It is a search engine with several social engagement and crowdsourcing features.
Legal Research using digitised historic Australian Newspapers August 2010, by...Rose Holley
Rose Holley gives an overview of the Australian Newspapers service which is now integrated into the Trove discovery service. The digitisation workflow, user engagement and searching are covered
Trove: Innovation In Access To Information. June 2010Rose Holley
Presentation given for the Creative Industries Innovation Centre.
Describing why Trove is innovative, and how collaboration has been key to innovation. Collaboration in digitisation, metadata sharing, storage, committment to open standards and interopability has helped create Trove - a single point of access to Australian information.
Report to CBC on the growth of the Kingston Organic Community Garden, Canberr...Rose Holley
The Kingston Organic Community Garden (KOCG) was opened in October 2008, in Canberra, Australia. It was formed on two disused tennis courts on land owned by the Canberra Baptist Church (CBC). Rose Holley - Committee member (voluntary) reports on progress with building the garden and garden community in the first year.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Thank you for inviting me to speak here today. Before I begin I would like to acknowledge the hard work of the ANDP team over the last 2 years. Our team was small consisting of only 6 people and we worked closely together with a shared vision and goal to achieve what I will show you today.
All the information I will speak about today is available on the ANDP website. The address is www.nla.gov.au/ndp Under the project details tab are several papers, reports and all previous presentations. We have a high level of transparency with the program and this website has proved to be a useful information tool for the public, librarians and stakeholders. All information about titles to be digitised is available under the ‘selected titles’ tab.
The overall objective of the Australian Newspaper Digitisation Program is to improve access to Australian newspapers, focusing first on content that is out of copyright – so up until the end of 1954. Up until now people wishing to research historic Australian newspapers needed to go to libraries across Australia and scroll through reels of microfilm. This program aimed to provide an online service that will let people anywhere, anytime access these newspapers via the internet. The service is now available. It is free. You can full text search across every page of every newspaper in the service, including advertising, cartoons, letters to the editor as well as the news and sports articles.
Every state and territory library in Australia is involved in this national program. By 2011 we will have digitised 4 million newspaper pages, that’s about 40 million articles. The Sydney Morning Herald will comprise 600,000 pages of this. Each state has selected a major daily newspaper to begin with. We are working in collaboration with the state and territory libraries to digitise these first 4 million pages since many of them own the microfilm copies of the newspapers that we use to create the digital images from. Regional newspapers will be included from this year onwards. Regional titles are being contributed from libraries around Australia.
The program started 2 years ago and we have digitised 1.8 million pages from microfilm so far. It is a 2 step process. Firstly microfilm is scanned into digital images by our contractor in Sydney and then the pages are sent to our contractor in India for Optical Character Recognition (OCR) processing. This makes them full text searchable. After quality assurance they are made available to the public through the Australian Newspapers service. The Beta service was released to the public in July 2008. It now contains 360,000 pages (3.5 million articles) and is being very well used. We will add another 40 million articles into the service by 2011.
Today I am going to mainly talk about data enhancement by the public, including text correction. However I would just like to say that the development of the public search service is only one aspect of the overall program. Behind the scenes we have been undertaking significant software development – we have designed 2 systems, the Newspapers Content Management System which includes Quality Assurance Modules and the Search and Delivery System. We have also upgraded our infrastructure and purchased 63 TB of storage so far for the national newspapers storage infrastructure. All aspects of the digitisation process are being outsourced (some offshore). The ANDP team of 6 has been responsible for all aspects of the project. In addition we have employed some university students on a casual basis to undertake the quality assurance processes.
On the technical side of things we are using a MySQL database and a Lucene search index. It was not our preference to undertake software development to the extent we have, but since there were no solutions available off the shelf we have gone down this path. It is our intent to share the code as open source for both systems sometime in the near future. We have had a lot of interest from other national libraries and institutions who wish to obtain the code and/or assist us with software development.
The development cycle for the search and delivery system was first to release a prototype to state and territory libraries for feedback in 2007. We then developed a beta version in 2008 which had a public release. It is our intent this year to move the beta version into a version 1 and officially launch the service very soon.
This is the home page of Australian Newspapers beta. Users can either keyword search or browse by date, title or state. The service is being heavily used with around 28,000 keyword searches per day and an unknown number of browses. We have not widely publicised the service as yet since we are still in a beta version.
People are predominantly searching for names in the service. This is a visual image of search terms. The most searched names are John, William, Thomas, George and James.
Search phrases also remain pretty similar from month to month with phrases often being a personal names in combination with the term, births, deaths, murders, shipping. In December 2008 the term ‘christmas’ was also a popular search term.
Most of the users have found out about the service from genealogy blogs and forums. This is an example of a popular international forum where the news of OCR text correction wings its way from Mary in Italy to William in Gateshead UK, (PAGE)
to Zoe in London, to Uncle John in Bedforshire, and then to Harry’s mum in Brisbane in a matter of minutes.
The features discussed on the forums that the public are using are adding tags and comments to articles, and correcting the text within articles. We’ll look at each of these in turn..
Firstly when a user comes to the site they can choose whether or not they want to login. It is not mandatory to login even if using the tag, comment or text correction features. The benefit of logging in is that users can track their activity and if they are a top corrector they may appear in the top correctors hall of fame. To date we have 3000 registered users, out of over 300,000 unique users. Of the 3000 registered users 1300 are regular text correctors. We do not know how many unregistered (anonymous) users are correcting text.
Newspapers have a hierarchy of issue, pages in an issue, and articles on a page, which is reflected in the system. It is easy to navigate between the levels when browsing newspapers. This shows the page level view. On this screen you can move the frame splitter on the left to entirely hide the left bar and view only the newspaper image if you want. To access the enhancement features the user needs to go to the article level. If you do a keyword search instead of a browse you will come to article view immediately.
This is the article view. Users can zoom in or out and choose to view the article in the context of the entire page. They can also navigate to any other page within the newspaper issue. The electronically generated text created through the OCR process is displayed on the left hand side. This is also where the users can use the 3 enhancement features. Users can tag the article with keywords and they can write comments and notes about the article. If users login they will be able to choose to make their tags and comments public or private. So they can share their comments with all users or they can add their own private research notes that only they can access. One feature that we believe is innovative and not available in any other online newspaper service, is the ability for the user to correct the electronically generated text. There are a number of reasons why the electronically created text is not always 100% accurate, mainly due to the quality of the original newspaper that the image was created from. Users can correct the text by clicking on the ‘Help fix this text’ button. We will now use these features on this article. The article we are looking at is the first report in an Australian paper of the sinking of the titantic.It’s in the Northern Territory Times on 19 April 1912.
I want to tag the article with ‘titantic sinking’. If a user does not login when they first enter the service then the first time they want to enhance an article they will be offered the option to login. At this point they can either login or enter the captcha to verify they are human (and not a robot – attempting to do something undesirable).
Once logged in or verified with captcha a user can enter their tags.
Now I want to add a comment. Those of you who read this article may have noticed that it was reported that all passengers were safely rescued from the titanic and the weather was calm. I’ll just add a comment to say this was unfortunately not the case.
Now I have zoomed in on the image and if the OCR text was inaccurate I would edit it in the box on the left. In this article the text is actually very accurate so has either OCR’d very well, or already been corrected by someone else.
Now we can review the article with all the enhancements we have made showing on the left. Tags, comments and corrections. We can view the history of all the enhancements (both ours and other peoples history). So those were the basics, but lets take a closer look at users activity with the enhancement features over the last 6 months
Adding tags has been a hugely popular activity for users. 46,000 tags have been added. However of these the vast majority are for personal names and only 34 tags have been used more than 100 times… This has led not to a useful tag cloud, but to tag fog! The screenshot shows the ‘John’ fog. Most of the tags have been used less than 10 times. Of the 46,000 16,500 are unique. The use of tags is surprising because we were dubious initially about the value of tags for articles when every article is full-text searchable and if the name you are looking for is incorrect you can edit it so that you can find it again. It certainly appears that people are using tags to try and track their research. Very few services if any have enabled tagging of full-text items, most tagging is for image collections only so what we are seeing here is new to us.
The most used tag (one of only 5 that jump from the fog) is LLRSA which we have now discovered is short for the Light Railway Research Society of Australia. They have 250 members who are using the tag to record their group research.
Tagging enables ‘marking’ or ‘saving’ of records into a group so that you can come back to them later. There is currently no other method to save a group of articles, other than bookmarking them.
Each user has a profile page where they can view their latest tagging, commenting and text correction activities. The user profile pages are visible to other users. At this stage users cannot edit their profiles. It is desirable however that users are able to edit and personalise their profiles so they can share information about themselves and their research interests with other users.
By browsing user profile pages we can see 2 distinct methods that people use to correct text. This first profile shows us that this user is looking at lots of different articles with a similar subject – flying saucers and ufo’s and just correcting a few lines in each article. The profile shows the article, the date changed, the old text and the new text.
The next user profile shows method 2 – find an interesting article and then correct the whole article. Two of our top correctors are correcting long articles on gruesome murders, this is a popular theme. Text correctors report doing 1-3 hrs of text correction at a sitting on average. The average visitor spends 17 minutes searching and reading articles in a session.
Several people can correct the same article. All corrections are saved and viewable in the history of the article. All versions of corrections are searched for. It is the last correction that is visible in the left hand pane. Articles are corrected by many users when they are either very long, very significant, or very illegible. For example this article is in the first Australian newspaper – the Sydney Gazette and NSW advertiser of March 1803. Around 20 people have made corrections to this article. It is particularly challenging because of its use of the long f instead of an s.
This is the text correction history of this article, showing all the different users and what parts they corrected.
Another regular activity of text correctors is methodically working through the family notices to correct names in the births, marriages, and deaths columns. This is a perfect example of a barely legible births column in the image on the right. We can see that it has already been corrected by a user and we can view the corrections.
The raw OCR text has basically come out as rubbish (on the left) and users here have just fixed the names but not the rest of the words in the line. This means that other people will now be able to find these names.
The comments feature was originally for researchers to annotate articles. It changed its name from annotations to notes to comments after user feedback. Some users are annotating the articles and adding further information about the content of the article or people mentioned in the article.
Other users are adding comments on the physical state of the image or difficulties and questions they have around text correction. We have observed users using the comments to communicate with other users.
We are not moderating text correction and this was a risk that both we and the users were aware of. To date no vandalism of text has been reported to us or noticed by us. By being transparent about the lack of moderation and giving a high level of trust to our users we appear to have gained a committed, responsible and dedicated group of text correctors. Some have likened it to Wikipedia. However if a user was to change something incorrectly we can see by this example that it would not take long for another user to notice it and correct it. In the example 3 different users are correcting the same article and helping each other in a matter of minutes. The users are therefore moderating each others corrections at the moment. In the worse case scenario that something was changed totally incorrectly other users would be aware of this since they can all still see the image. Also the search engine searches all text, even corrections of corrections so the original terms are still retrievable. Users have been using the comments field to communicate with each other and ask for help as this example also shows. This is because there is no other forum for them to communicate with each other at present.
Since the release of the service in July 2008 text correction has remained consistent among a core group of 1300 correctors who have mostly been doing the same amount per month. Between 300,000 and 400,000 lines of text are corrected per month in 15-20,000 articles. There was a slight dip in November which was due to no new articles being added that month (which many users said de-motivated them). However text correction increased in January, despite there still being no new content added. Perhaps a lot of people were staying inside in the 40 degree heat looking for things to do with air-con on?
In the first 6 months a total of 2 million lines in 100,000 articles had been corrected. The top 5 correctors had consistently remained in the top 5 each month and were working up to 45 hrs per week on text correction. Top correctors are correcting up to 30,000 lines per month. We had many users saying that t ext correction is proving to be an ‘addictive’ or compulsive activity. They sat down to fix a few words for 5 minutes and before they knew it 3 hours had passed. This was very interesting.
Due to user demand we instigated the ‘hall of fame’ into the beta service. The top 5 correctors show on the home page and also in the hall of fame. Originally the hall of fame only showed the top 10 but users wanted to see more, so now it is anyone who has corrected more than 5000 lines per month. Users are still asking for entire league tables however so they can see where they are in the big picture. This is a motivating factor for them. During development it was suggested that we need to use gaming technologies to encourage people to correct text but this has so far not proved necessary!
One of the things we have not been able to do is to measure the overall improvement in OCR accuracy. This is due to the difficulty measuring the OCR accuracy of the raw text to start with. I have written a paper published in D-Lib this month (March/April) about the difficulties of measuring OCR accuracy. A simple but resource intensive solution may be to compare words in an article with words in a dictionary before and after text correction as a comparison. We are supplied with a page level ‘word confidence’ figure from the OCR engine (where 0 is poor confidence and 1 is good). As a matter of interest we have plotted the text corrections in articles on a page, against the existing OCR engine provided page confidence levels for the entire corpus to date. The corrected lines have been scaled back by a factor of 7 so that they are more easily compared. The graph shows that corrections are above the page number curve for low confidence and below the page number curve for high confidence and about the same for mid confidence. So lower confidence pages tend to attract slightly more corrections proportionally than higher confidence pages, but the effect isn’t that pronounced. Page with low confidence make up 10.6% of the corpus and they get 16.7% of the corrections, pages with high confidence make up 20.4% of the corpus and they get just 18.4% of the corrections. 69% of the corpus is of average confidence and these pages get 64% of the corrections. It would be entirely feasible as some users have suggested to actively ‘serve up’ the articles on pages that have low page confidences if we wanted to target these for corrections.
So after all this activity the most common question people kept asking me was “Who are these people?” and also “Why do they do it?” Some people even suspected that the text correctors were really library staff, which is not the case. The text correctors are real, normal people. We sent some of them a survey to find answers to our questions about how long they spend correcting, why they do it, what motivates them, what would motivate them to do more or less? The responses were very interesting.
The three main reasons for correcting text were: We’re helping to provide an accurate record of Australian History We want to record family names and help others as we go We think it is a useful cause that will help all Australians, the Library, and ourselves and we are willing to give time for this.
The motivating factors given were no different to those that motivate anyone to do anything for example they enjoy it, they have their own research goals, the think about the main outcome (ie making it better for everyone), they have been given a high level of trust and respect to do the job, and it is a challenge.
To maintain or increase their motivation they again gave standard motivational answers. Things we had not done which they would like were to give them detailed instructions on how to do the job, to create for them a feeling of team spirit and being part of a virtual community, to recognise their achievements and acknowledge they were making a difference, and lastly to give them more content. They said the more content they were given the more they would do. Many noted that we had not publicised the service in any way or called for volunteers and the potential to harness a lot more volunteers was vast.
All our top 5 correctors are Australians living in Victoria, New South Wales, and Queensland, with one in America. The five turned out to be 6 since one was a married couple sharing a logon to do research. Of the 6, 4 are female and 2 male. One is working full-time, one is a stay at home mum and 4 are retired. They are aged between 38 and 65. Three of the correctors are correcting as a volunteer ‘do good’ activity and trying to think up topics to correct, whereas the other 3 are correcting around their own areas of family history and local research. 2 of the 6 are also transcribing shipping records and births, marriages and deaths for other organisations. Here are some quotes from some of our top correctors. Julie is our top corrector and has corrected 2,500 articles so far. She is in her thirties and is a stay at home mum. She mainly corrects articles on local history and murder and corrects whole articles at a time. She says “ I enjoy the correction – it’s a great way to learn more about past history and things of interest whilst doing a service to the community by correcting text for the benefit of others” I keep doing because of the knowledge that you are doing something that will benefit future people that wish to access articles on their family history.
Catherine is located in Washington DC and works full-time as the Director of an e-commerce company. She says “I enjoy typing, want to do something useful and find the content fascinating. I do it to benefit others”. Also she does not watch much TV. Lyn and Maurie a retired couple work on it together as part of their family history shipping research. They also do voluntary work for the mariners records. They say “ We get sick of doing housework, we find text correction addictive and it helps us and other people. How can you not correct errors when you see them?”.
Mick is recently retired from IT. He says “ I thought I could be of some assistance to the project. It benefits me and other people. It helps with my family research. I would do more if I had broadband and did not have to share the computer with the rest of my family!” Fay is retired, she says “I enjoy the challenge, I need something to do in my spare time and it benefits me and others”
Many of our current text correctors are genealogists. Genealogists do things that other groups of people may not. There is a genealogists ‘to do’ list that is circulating on blogs at the moment. It gives a useful insight into the life of a genealogist. One thing that is very important to them is what they call ‘random acts of genealogical kindness” where they may do something helpful for someone else that will help them trace their family tree. They also do organised acts of kindness such as transcribing births, marriages and deaths records. Genealogists very quickly get to grips with new technology if it helps them access resources or achieve one of their objectives.
We have been gathering feedback from users for 6 months about the beta service and text correction in particular. The feedback has been overwhelming positive and thousands of suggestions and comment have been received. The feedback was gathered from a survey form, from e-mails, by observation of users, by statistics, and by lurking on forums and blogs (going into the users spaces). The users have given us valuable feedback so that we can better meet their needs. Some of their ideas match our own and other ideas they have given us are innovative and fresh and we had not thought of them ourselves.
The main requests from users for improvements are as follows: Improve the text correction feature (so they can do more) Have more advanced searching including ability to define and search across enhancement layers e.g. tags only, tags and corrected text only, tags and comments. Have a communication mechanism e.g. a forum Enhancement of user profiles More statistics and where they are in the big picture of text correctors Alerting for new content coming into the database Guidelines for enhancement activities
Some of the questions we now have to answer are: - How can we improve the text correction functionality, and if there is a quick mode and a power user mode – what should they be like? This is a mockup of a possible improved method.
We’ve had lots of discussions about what tags and comments should be associated with – the issue, the page, the article, a line, a word. This is an early mockup of pinning a ‘post it’ type note to an article. Although visually this method made it easier for users to understand the difference between correcting text and adding a note, it created confusion since users did not know where to pin the note on the image and when many users attached tags or comments to the same image you could no longer see the image. Hence we reverted back to the textual approach. What the tag or comment is associated with is important when considering how to search across the enhancement layers. Also if enabling searching of layers e.g tags and comments together, decisions about how the relevancy ranking should work will have to be made. At present the relevancy ranking is based on a number of standard things with the addition of categories. Hence a news article or family notice will appear higher in the list than an advertisement or sports result.
The lessons we have learnt to date are that engaging with users and building virtual communities is just as important to the users as providing the data itself. They want to be part of a community. By giving the users a high level of trust we have built commitment and loyalty in the community. Another lesson we have learnt is that using the term ‘text correction’ is not always helpful. It implies that something will be corrected and the old version deleted, which has caused concern to stakeholders and to the public. However as users undertake the activity it has become apparent that what they are doing is ‘enhancement’ or ‘enriching’ the data. They are actually creating layers on top of the original data, and all the layers can be transparent and separate or jointly searchable. The term ‘enhancement of data’ is not one which has yet become common terminology in Australian libraries but it will not be long before it does and is commonly understood by both the public and libraries. Lastly we know that the Australian Newspapers has had a big ‘social impact’ on peoples lives and the genealogical community. We are unable to quantitatively measure the impact or predict what may happen next.
Traditionally libraries have held the power and control over data but the Australian Newspapers service is shifting that power to the community. Recently Barack Obama speaking on community engagement and volunteering said “Don’t under-estimate the power of people who join together …. They can accomplish amazing things”. This is true. People want to achieve amazing things and we as librarians have the power to give them both the data and the tools to do this – they will do the rest themselves. The challenge for the library is now how to nurture, sustain and grow this virtual community we have created and their resulting activities.
The future potential of text enhancement is mind boggling when you think of it in the world context. In Australia alone we have 21 million people, more than half of whom have internet access at home so could potentially be volunteers. FamilyIndexSearch project report that in their first year they had 2000 volunteers and by their third year they have 160,000 volunteers correcting birth,marriage and death records. The Australian Newspapers program has the potential to match this easily. But why just think about Australian Newspapers? This functionality could be applied to many other full-text resources, indeed a global centre could be established where users decide what types of materials from which countries they wish to enhance. The future is exciting and open.
That brings me to the end of my talk. I could of course talk a lot longer but I wanted to give you the opportunity to be able to ask me some questions. There is a full report on the activity of text correctors called ‘many hands make light work’ on the website. Thank you.