Thank you for inviting me to speak here today. Today I am going to talk about the potential of crowdsourcing for libraries. This is based on what I have learnt as manager of a crowdsourcing site and from talking to other Managers of crowdsourcing sites. I am expressing my own personal viewpoints today and they are not necessarily the same as those of the National Library of Australia where I work.
Crowdsourcing is a new term. One which my spellchecker does not recognise and a word that hasn’t been used much in a library context up til now. There isn’t actually an agreed definition of crowdsourcing, though there is a great Wikipedia article on it. My explanation of crowdsourcing and the difference between crowdsourcing and outsourcing and social engagement is that: Crowdsourcing is usually done by a large group of unpaid volunteers, rather than a company, working towards a clear big goal, for the common good. The group may use social engagement strategies such as reviewing, marking, checking, identifying items, but rather than just helping them personally these activities when joined together result in a big overall achievement being made. Crowdsourcing may but not always require a greater level of effort than social engagement e.g. rather than clicking a checkbox to rate something you may be asked to read it and categorise it. Crowdsourcing projects almost always have a big seemingly unachievable goal at the beginning.
For Example making out of copyright books electronically available, transcribing birth death and marriage notices so that they become searchable, creating a free online enclopedia.
Why should libraries even think about doing this? The answer is that there are 8 significant benefits for us. We can achieve goals that we would never have the resource – financial or staff to do inhouse or to outsource. Crowdsourcing galvanises people to work fast towards a goal so results happene quickly. The community is actively engaged and we are able to effectivley utilise their knowledge.
The community are adding huge value to our collections and services and in turn we are encouraging a sense of public ownership and responsibility towards cultural heritage items, many of which old significance for our nation. We build trust and loyalty of our community and through the activity we can demonstrate the relevance and value of libraries in our society today. In my talk this morning I am going to show you 8 brilliant examples of crowdsourcing. 2 are from libraries and the other 6 have direct relevance to libraries. I’m going to explain to you the common factors in crowdsourcing and give some tips for crowdsourcing. Finally I’m going to look at why libraries aren’t already doing it and what we need to think about and change to go forward into this exciting area. This information has been gathered by interviews I have undertaken with other crowdsourcing site managers asking them simple questions like ‘what lessons have you learnt?. I have been contacted by crowdsourcing site managers since in March I published a report called ‘Many Hands Make Light Work’ which was reviewed internationally and widely discussed. It was about the digital volunteers in the Australian Newspapers service of which I am manager.
This is the first site I am going to discuss as a library example. The site was released in August 2008 and contains milllions of articles of out of copyright australian newspapers from 1803 to 1954. Since release it has been heavily used.
It is very innovative since we not only allow, but also encourage all users to correct the electronically translated text of articles. The text is poor because it is the raw OCR and the newspapers are mostly of very poor quality. The electronically generated text created through the OCR process is displayed on the left hand side. This is also where the users can use the 3 enhancement features. Tagging of articles, adding comments to articles and correcting the text. Of the 3 the text correction is the most popular and the feature that is being most used. This innovative feature is not available in any other online newspaper service, and so has created a high level of interest from national libraries internationally. They have been watching us to see the results and activity that is occuring around this, and thinking about its wider application.
The results are pretty astounding both to the National Library of Australia and the world in general. So far over 6000 users have been actively correcting text each month and they have so far corrected 7 million lines of text. They have also been using the other features especially tagging to futher improve the quality and depth of the article information.
My second example is picture australia. This contains digital images from different Australia institutions. In 2006 a new feature was implemented in partnership with flickr which was to encourage members of the public to upload their own photographs on particular subjects into the national collections in order to improve the quality and depth of the collections. For example modern day people, places and events are topics we want the public to add.
The public were keen to do this and there is an active pool of volunteers who to date have added 55,000 images to our collections. The quality and standard of these images is very high.
My third example is FamilySearchIndexing. A site run by the Church of Latter Day Saints in Utah. In August 2005 they enabled the Indexing part of the site which encourages members of the public to view handwritten BMD records and transcribe them. These records are then transferred into the search system.
This is one of the largest sites of its kind. There are currently 160,000 volunteers around the world working on BDM for different countries (including NZ and Australia). I don’t have the exact figure of the number of records transcribed but it is thousands if not millions over the last 4 years. The volunteers need to help out because most of the records are handwritten and so can’t be effectively OCR’d.
Example 4 is distributed proofreaders. They were established in 2000 originally to help Project Gutenberg. Their mission is to make out of copyright texts available for free online. They now work for anyone. Each country has volunteers including Australia and NZ.
They have managed to make 16,000 public domain books and journal issues available over the last 9 years as E-books with their volunteers doing every step of the process – finding the books, scanning the books, ocr’ing the books, proofreading and marking up the books and finally converting them into e-books through a distributed system.
Wikipedia is our most well known crowdsourcing example of course. Although we may not be able to remember life before Wikipedia it has actually only been in existence for 7 years.
It’s achievements have been immense, having a real effect on society. The English version of the encyclopedia has 3 million articles, but actually there are 250 different language encyclopedias containing a total of 10 million articles with the German and Spanish versions being very large.
The setup of wikipedia is a bit like the Hells Angels with each country having what they call a ‘chapter’. What most of you have probably never heard of is that the German chapter of wikipedia worked with the National Lib of Germany to help them correct the Personal Names Data Authourity File and also integrate it with links both ways between wikipedia and the national catalogue. Wikipedians worked at a break neck pace correcting and linking 20,000 names in just 2 weeks. It was pretty easy for the volunteers because Wikpedia already has its own personal names authority file to match against.
The most interesting example in my opinion is that started in June this year by the UK newspaper The Guardian. There was a big controversy in the UK over MP’s expenses which caused public outcry. The result was that the MP’s expenses claims documents were to be made publicly available. The Guardian digitised them all and in a matter of a few days put up a public website where people could easily read them and mark those they thought needed further investigation and were potentially scandalous. Most of the claims were handwritten and largely illegible.
Within 80 hours 20,000 volunteers had read and checked nearly half of the expenses claims, a staggering 170,000 potentially very boring documents (had it not been for their very personal nature). People were looking for juicy things like expenses claim for pornographic videos, and the discovery of a duckhouse costing $4000 modelled to the very detail of a french chateux. Hardly necessary for taxpayers to pay for.
My favourite site is Galaxy Zoo. I strongly recommend you have a look at this one when you get home. It has hooked in the world. It is exposing millions of digital images of the galaxy, never seen before and getting the public to help classify and identify them.
So far there are 150,000 volunteers who have classified over 50 million previously unseen galaxies – exciting stuff!!
An early example and one which is no longer active is the BBC WW2 Peoples War. In 2003 the BBC set up an interactive website to enable the public to record their stories of WW2 and upload their photos and artifacts. It was mainly older people without any previous computer or keyboard skill who did this, and libraries assisted by giving free internet access to those who wanted to contribute. A side outcome was the establishment of an active community who could communicate with each other online. The people in this group were very sad when the project closed and their group communication was shut down.
In the time 32,000 people contributed and added their content. The whole site is now archived.
An example one of my own digital volunteers alerted me to is the Mariners and Ships in Australian Waters. They are transcribing shipping and other related lists, the original items are in the state archives, but this site has been instigated and set up by volunteers, not the state archives. They have 600 volunteers.
There are other examples but I just lastly want to mention the FREEUKGen project. This has different parts. It’s one of the oldest projects starting in 1999 and the public are transcribing British BMD records, the census and other things. It is similar to the FamilySearchIndexing project. There is a real need for handwritten archives, manuscripts and records to be transcribed by hand so that they can become searchable and accessible.
In looking at all these sites I have been trying to find out if there are common factors in crowdsourcing and if what we are experiencing in Australian Newspapers is unique due to our country and resource, or whether crowdsourcing would work just as effectively in other countries, with other resources. I was also interested in finding out the lessons we have all learnt so that we can apply them when we set up new crowdsourcing sites. My discovery is that there are commonalaties in almost every project and it is my belief that if libraries: non profit making organisations were to apply the tips for crowdsourcing I am about to share they would undoubtably be successful.
We’re now going to look at the common factors amongst the examples which are: -Volunteer numbers and achievements -Volunteer profiles -Volunteer motivations -Rewards and acknowledgement -Management of volunteers
All the projects started very quietly and mostly continued without any fanfares publicity or marketing. Initially the numbers of volunteers were very low, but via viral marketing (forums and blogs) volunteer numbers exponentially increased. All sites wondered what would happen if they ran an advert on TV.. In all cases volunteers did far more work to a higher standard than expected and made significant achievements.
The most common questions people ask me are “Who are the volunteers?” and “Why do they do it?” Some people suspected that our text correctors were really library staff, which is not the case. The text correctors are real, normal people. They are anyone and everyone. I sent some of our volunteers a survey (as had the Distributed Proofreaders and FamilySearchIndexing) to find out the answer to these questions. Our survey results matched those of other sites and were very interesting.
The majority of the work is done by ‘super’ users or volunteers. The top 10% of volunteers can do as much as 89% of the work. Their age varies. It is not all older people as some imagine, in fact it is highly likely that moderators or those with extra responsibilities, or the super users are dynamic young professionals who have full-time jobs. There are retired people, but also stay at home mums and disabled or sick people. The volunteer profile is broad. The volunteers all note that they learn new things, that they prefer working for non-profit making organisations, because they don’t feel taken advantage of, and they often work on multiple voluntary projects. 50% are doing it because they want to do some voluntary work and the other 50% because they are really interested in the topic. 50% want to choose what work they do e.g. I want to do this book and the rest want to be given work ie whats the next page to be done.
The motivating factors people gave for doing online voluntary work were no different to those that motivate anyone to do anything, for example they enjoy it, it’s interesting and fun they’re thinking about their own personal goals and also the group outcome. They like to think that what they are doing matters to their country or the world at large so historical and scientific projects especially are big draw cards
When given a high level of trust and respect they want to repay this so work extra hard. When given a big goal they like the challenge, the bigger the better. Giving something back to the community and helping each other were often cited, and many of these projects proved for unknown reasons to be totally addictive. Especially so the Galaxy Zoo and Australian Newspapers.
Not realising that volunteers had such high and sustainable levels of self motivation they had all been asked intially what would motivate them more and their answers were: Give us more stuff to do Raise the bar of the goal Progress chart We want Online camaraderie Clear instructions Acknowledgement Reward
I just wanted to show you some of the profiles and responses to our survey for the Australian Newspapers users. All our top 5 correctors are Australians living in Victoria, New South Wales, and Queensland, with one in America. The five turned out to be 6 since one was a married couple sharing a logon to do research. Of the 6, 4 are female and 2 male. One is working full-time, one is a stay at home mum and 4 are retired. They are aged between 38 and 65. Three of the correctors are correcting as a volunteer ‘do good’ activity and trying to think up topics to correct, whereas the other 3 are correcting around their own areas of family history and local research. 2 of the 6 are also transcribing shipping records and births, marriages and deaths for other organisations. Here are some quotes from some of our top correctors. Julie is our top corrector and has corrected 2,500 articles so far. She is in her thirties and is a stay at home mum. She mainly corrects articles on local history and murder and corrects whole articles at a time. She says “ I enjoy the correction – it’s a great way to learn more about past history and things of interest whilst doing a service to the community by correcting text for the benefit of others” I keep doing because of the knowledge that you are doing something that will benefit future people that wish to access articles on their family history.
Catherine is located in Washington DC and works full-time as the Director of an e-commerce company. She says “I enjoy typing, want to do something useful and find the content fascinating. I do it to benefit others”. Also she does not watch much TV. Lyn and Maurie a retired couple work on it together as part of their family history shipping research. They also do voluntary work for the mariners records. They say “ We get sick of doing housework, we find text correction addictive and it helps us and other people. How can you not correct errors when you see them?”.
Mick is recently retired from IT. He says “ I thought I could be of some assistance to the project. It benefits me and other people. It helps with my family research. I would do more if I had broadband and did not have to share the computer with the rest of my family!” Fay is retired, she says “I enjoy the challenge, I need something to do in my spare time and it benefits me and others”
Very few of the sites had thought to give reward or acknowledgement (and had initially associated this with money of which they had none), but several such as ourselves had instigated rewards and acknowledgements suggested by users. All of this was simple and cost free. The most requested was for individuals to be able to identify themselves to other volunteers, and also sometimes the public, and for them to see overall ranking tables to see where they fitted into the big picture. The ranking tables were more about big picture than being of a compettive nature. Other ideas were meeting the paid staff (which surprised the paid staff that this would be considered a reward) and certificates and promotional gifts.
All organisations agreed that management of volunteers was not a big task and nor should it become one. None had dedicated staff to manage volunteers (even Wikimedia which has 10 million volunteers). Instead they all agreed that getting some volunteers to manage others was the way to go and setting up communication and sharing software such as wiki’s and forums was the way to go to minimise staff time. For example instead of a staff member answering an enquiry another volunteer could in the forum could answer the question if they could see it. The paid staff who spent as little as 1 hour per week or less on managing volunteers saw their role as creating/establishing or endorsing policies, FAQ and guidelines only.
So after all this talking I am finally able to summarise for you 14 tips that you should implement on your site if you want to crowdsource effectively. I’m going to illustrate my points with screenshots from the sites I have discussed. I should say no site does all of these things. I think this is largely because no-one has ever looked into crowdsourcing techniques as seriously as I over the last few months and pulled all the pieces together. Therefore if you set up a site which does all 14 things I think you would be on to a winner for sure! The first is having a clear and big goal on your home page.
The next is show your progress towards the goal. This simple red bar from the Guardian is very effective.
They’ve taken it to the next level by having progress bars on groups of records as well. They’ve also personalised this one by adding a photo of the MP which motivated people even more.
DP, wikipedia next
Front page – updated in live time
Your system has to be quick to get into and reliable once in. Really seriously consider whether you want people to have to register first or whether they can do it anonymously. You want as few blocks and clicks as possible so they do stuff quickly and on the spur of the moment. This is AN where we decided it was not necessary to login or register first, but they do need to do a captcha for the session to stop spammers and robots.
It must be both easy and fun. Many of the sites that require use of the human eye showed the original image on the left and the action or questions you need to answer on the right. Simple large boxes are key. Here is the Guardian expenses again.
They have only 2 actions to make. The wording on the buttons is also very encouraging.
Here we are in galaxy zoo, starting the identification process of an image of a galaxy, with our first simple question.
This is followed by 2 more simple questions. The boxes are clear, easy and quick to just click on.
In Australian newspapers there is no knowledge of wiki editing, html or mark up required. It is simple to look at the image and simply correct the text by clicking on it and then saving on the left.
All sorts of interesting stuff is discovered in these projects and often outcomes you had not expected, as well as your goal happening. It is really important to remember to tell all your volunteers this information, because it spurs them on.
Guradian – don’t you just want to click on the ‘best individual discoveries’?
Here is the ‘hall of fame’ from the AN service. The top 5 correctors show on the home page as well as in the hall of fame. Originally the hall of fame only showed the top 10 but users wanted to see more, so now it is anyone who has corrected more than 5000 lines per month. Users are still asking for entire league tables however so they can see where they are in the big picture. This is a motivating factor for them. During development it was suggested that we need to use gaming technologies to encourage people to correct text but this has so far not proved necessary!
The Guardian implemented ranking tables as well.
Picture Australia acknowledges outstanding contributors by name, publicly (if they agree), and in newsletters and library publications.
The remaining tips are as on this slide. Tip 7. The Content or thing must be interesting (history, science, animals, personal, topical eg guardian scandals) Tip 8. Give volunteers options to be visible (to each other and the public, via profiles on items they have created, helped with, name of galazies) Tip 9. Give volunteers an online team environment e.g. wiki, forum cameradie and fun Tip 10. Give volunteers choices (do the next or pick something) Tip 11. Assume it will be done well (to build trust and expectation) Tip 12. Keep the site alive (new content, activity) Tip 13. Take advantage of topical events (news, disasters, anniversaries, deaths etc - Wikipedia) Tip 14. Listen to your ‘super’ volunteers carefully. Whatever they say is important they are your heaviest users.
The future potential of crowdsourcing digital volunteers is mind boggling when you think of it in the world context, and how many people have internet access. In Australia alone we have 21 million people, more than half of whom have internet access at home so could potentially be volunteers. FamilyIndexSearch project report that in their first year they had 2000 volunteers and by their third year they have 160,000 volunteers correcting birth,marriage and death records. The Australian Newspapers program is set to match this easily. Libraries have lots of data to expose and crowdsourcing could really make a radical difference in opening up access to archives, especially in tasks where technology can’t do better than the human eye and brain. Most of the examples given required manual work using the human eye and brain, with fantastic results.
I have a big vision a ‘global vision’. I’m suggesting that libraries and archives need to think about crowdsourcing but not on an individual basis. There should be a global pool of volunteers and a global pool of work for them to do in library/non-profit making projects. Digital users do not see institution walls. We know this. We have tried hard to break down the walls for digital access via federated searches and national services eg Digital NZ, Picture Australia, Matapihi, Trove But we also need the walls to be down for crowdsourcing projects. For example, if you want to improve/transcribe text on shipping lists you should be able to come to a central portal to find all the projects and countries involving shipping lists. Not to have to first know about and then to go separately to Australian Newspapers, FamilySearchIndexing, and Australian Ships and Mariners.
So you may be thinking “why should libraries do this” why not leave it up to non-profit making orgs like wikipedia?
Part of the answer is because: We have the content We have the technology We have the public support
The other part of the answer is because we are different to Wikipedia, Google and Amazon. We are different because we are “ALWAYS AND FOREVER” Libraries make promises on their content: Long term preservation and access Not constrained by commercial pressures Universal access “ Free for all” ALWAYS AND FOREVER
We know we can do it because technology enables libraries to evolve and stay relevant. So what’s stopping us? It’s all about power and control.
Lets look at power in our society over the ages. First we had Divine Right of Kings Theocracy Important decisions made by a privileged and powerful few And then we had: Reason Liberty Democracy
Hmm, very similar to information power. First we had- information Produced by a relatively few large and powerful publishers Discovered by metadata hand-crafted by librarians Expensive and centralised Then came the web and information is Produced by anyone Discovered by full text and bottom-up linking effects Cheap and distributed
For libraries this was scary, threatening. Some lost confidence especially with so many people keep asking us if we are needed anymore. But of course we are – we need to evolve and move with the times. Remember we are needed because of the ALWAYS AND FOREVER position.
However concerns of some libraries, archives and museums in doing social engagement and crowdsourcing activities are loss of power and control. I have addressed each of the concerns listed on this slide with cultural heritage managers and am able to say through experience on the Australian Newspapers project that they have all been disproven, that is vandalism vs disinterest, data corrupted, loss of control, loss of power. None of these things happened, in fact the reverse. So good things can happen when you are not a power and control freak.
In my opinion when the public and libraries join together we have a ‘super power’ and amazing things can be achieved. Everyone in this room has the power to make this happen in one way or another. There is a lot of cognitive surplus as Clay Shirky would describe it just waiting to be harnessed. The examples have demonstrated this. Barack Obama said “Don’t under-estimate the power of people who join together …. They can accomplish amazing things”. This is true but the public could do even more if libraries committed to really pro-actively enabling this on a much larger scale. We know technically we can do it and that’s not what’s holding us back. In my experience of managing IT projects for the last few years it’s very rarely technical issues that hold us back, its other things. For example it has not been technically hard to implement text correction in newspapers so why did no one do it before? It required creative thinking to solve a problem and letting go of some of our library rules about who can do what and why and when. (rules are made to be broken….)
I don’t think it is power that we should be seeking to retain. There is something bigger than that. Freedom is actually a bigger game than power. Power is about what you can control. Freedom is about what you can unleash.” This quote really resonates with me.
So where to from here? What should libraries be thinking about in relation to crowdsourcing: Firstly we need to have social engagement strategies and techniques in place, and if we haven’t build them into our day to day work. We used to be really good at social engagement, but in the rush to delivery collections digitally rather than face to face we lost a lot of our social engagement. The users want it back. Then we need to think about what we want help with and why we want that help. What would be our main goal in crowdsourcing -making collections more accessible improving quality of data adding new content socially engaging??
After we’ve figured this out we need to look at the bigger picture; Building partnerships with existing non-profit crowdsourcing organisations and sharing the volunteers Working towards a global pool of volunteers and projects How we market crowdsourcing in libraries (we’ve never been very good at marketing ourselves let alone crowdsourcing) How we change strategic thinking from power to freedom
I hope you have found my talk interesting and even maybe inspirational. I want you to go away today thinking about these ideas for crowdsourcing and talk to your colleagues about them. I hope you come up with ideas and are then be able to make them happen. It is my firm belief that Crowdsourcing does have a place in libraries. By owning rich content, having no commercial intent, and wanting universal and free access to resources we surely will be able to harness the energy, enthusiasm and knowledge of the public to successfully help us in our task. Thank you.
Crowdsourcing and Social Engagement: Potential, Power and Freedom for libraries and users
Keynote presentation by Rose Holley, Digital Librarian
Pacific Rim Digital Library Alliance (PRDLA) Conference,
Libraries at the end of the world: Digital Content and Knowledge Creation
Thank you for inviting me to speak here today. In the next hour I am going to talk about the potential of crowdsourcing for libraries.
I am speaking in my own personal capacity and my talk is based upon my own recent personal research in this area (published in the E-Lis Library repository). I will draw on my own experience as manager of a crowdsourcing site, and on the experiences of other crowdsourcing site managers that I have interviewed.
I am expressing my own personal viewpoints today, not those of any particular library or organisation.
Walls are now down for digital access via federated searches and national services eg Digital NZ, Picture Australia, Matapihi, Trove
Walls need to be down for crowdsourcing projects too
E.g. if you want to improve/transcribe text on shipping lists you should be able to come to a central portal to find all the projects and countries involving shipping lists. Not have to go separately to Australian Newspapers, FamilySearchIndexing, Australian Ships and Mariners.