IMPACT Final Conference - Majlis Bremer Laamanen

CROWDSOURCING IN THE
DIGITALKOOT PROJECT
Majlis Bremer-Laamanen
IMPACT 24TH OF OCTOBER, 2011

Microtask.com:
Digitalkoot: Making Old Archives Accessible Using Crowdsourcing by
Otto Chrons and Sami Sundell,
Discussions Managing Director Harri Holopainen
harri@microtask.com

The Centre for Preservation and Digitisation: statistics

• Established in 1990 • Digitisation: 1,3
• Digitisation started in million pages
1998 • Audio digitisation
• Over 50 employees and cataloguing
music 1,300 unique
• Yearly average (past
cassettes and the
three years):
sleeves
• Microfilm
• Conservation:
production: 1, 3
10,000-15,000 units
million exposures

ENRICHING CONTENT
(http://digi.nationallibrary.fi, http://www.doria.fi/handle/10024/4194)

• Newspapers - > 2 million pages, the Historical Newspaper Library
• Journals - > 2,7 million pages, free to 1910, in all legal deposit
libraries to 1944
• Books - > travel, novels, Dissertations 17th century, Save the Book
• Ephemera - > industrial price lists
• Sound - > national sound archive, C-casettes
• Interest groups: the creators, users, contributors of the material

Context for mass digitisation and crowdsourcing

Client
Accessibility
Centre for Preservation and Digitisation
Temporary Physical
Preparation for Post- storage for
Digitisation objects
Transferring Digitisation processing digitised objects Retrieval
Physical
Objects

Mass digitisation activities in the most cost-effective manner:
Newspapers, books, journals, ephemera, audio:
• Logistics for physical items
• Process for digital objects: network services and long-term preservation
• Metadata Mets - Alto: capturing through process
• Metadata development: User experience and crowdsourcing
• Customizing of the tracking systems (CCS, Item Tracking, Scan Client)
• Operational environment: scaling architecture and implementation

DIGITALKOOT
DIGI = TO DIGITISE
TALKOOT = PEOPLE GATHERING TO WORK TOGETHER
VOLUNTARILY (WITHOUT PAYMENT)

FIRST EXPERIENCE 2011:
DIGITALKOOT: correction of OCR by gamification, turning useful
activities into games ”THE MOLE HUNT” by Microtask.com.
– People can spend hours on games
– Turning useful activities into games
– Activities can be rewarded with scores, achievments and social benefits

From February, 8th to September 15th, 2011: about 80.000
visitors, 4000 hours of effective game time. More than 5 million
tasks.

CHALLENGES

Meaningful tasks without breaking the flow of the game

Real-time feedback – many simultaneous players doing
the same task

Build a bridge to save the moles from falling down =>
– Correct typing gives you a block to the bridge
– Incorrect is punished by explosion

GAMIFICATION CHALLENGES
Balancing game play elements with task completion speed and
accuracy

Keep the motivation of people and enlarge the audience

Introduction of meaningful tasks into the game without breaking
game play mechanisms

Instant feedback on players´ actions (simultaneous players)
•pressure to adapt to varying feedback situations/latencities

POSITIVE EFFECT OF VERIFICATION

”The wisdom of the crowds”
• includes answers from possible spammers

Game start: verification tasks only

Accurate work shown => verification lowered in phases, never zero

Verification tasks are created automatically:
• A randomly selected task is sent to several players: all have to
agree on the result => verification task

VERIFICATION OF THE OCR

Players and their pace cannot be synchronized.

Verification tasks to the task stream:
•Fed to players varies according to the number of active players
•The system knows the answer: the game play is improved by fast
feedback
•Downside: no new information produced

USERS: February 8th to March 31st, 2011

31,816 visitors, 4,768 players, 2,740 hours of game time, 2,5 million
tasks.

1 % via Internet, 99 % via Facebook

Half of the users were men.

Gametime: seconds to over 100 hours (altogether).
Median time: => 9 minutes.
Women >13 minutes and 54 % of the tasks
Hardest working top 4 were all men

ACCURACY

OCR-system 0.8 confidential about accuracy => human correction in 30%

Random selection of 2 articles:
•1,467 words Digitalkoot result: only14 mistakes /228 OCR
•516 words Digitalkoot result: 1 mistake/118 OCR
•>> well over 99% possible by gamification

Spammer play:
•One player 1,5 hours and 5,692 tasks was detected by the verification
system and only 4 tasks were accepted

Enriching Digitisation Production
Processes, METS Profiles: a new
development platform RESOURCE
DIGITAL
Articles
Illustrations COMPREHENSIVE
Poems LEVEL OF DIGITAL COLLECTIONS
MARK UP
Standards & OAI-PMH
Structural metadata METS, ALTO complient METS SIP
POST packages
PROCESSING
METS EXPORT
Administrative/technical metadata MIX/PREMIS
Packesges include:
SCANNING JPEG2000

Descriptive metadata MARC21/MODS OCR TXT as ALTO XML

PDF
CATALOGUING Two Bibliographic
Newspapers Records JPEG(150)
Serials
METSXML
Books
Parchments MARCXML
Notes
Maps SOURCE MATERIAL
Audio
PHYSICAL COLLECTIONS

IN THE MEDIA

-Until March 31st, over 30 articles: all around the world: New York
Times…

-Television appearances ongoing

-Helsingin Sanomat : HS talkoot using the National Library´s
digitised newspaper material Historical Newspaper Library >
advertising Digitalkoot e.g. September 15th

-Influenced user interest
=> stabilisation to 300 individual users per week

NEXT
1) Marking of articles and/or
images
2) Indexing articles and/or
images

KUVATALKOOT
Goal: sophisticated
user experience
Collections discovery and

Luonnon-kirja ala-alkeiskouluin tarpeeksi / Z. Topelius, 1868
reuse of digital content by
researchers and people at
large:

Researchers will get better
systematic coverage of
images and articles in
published printed material.

IMPACT Final Conference - Majlis Bremer Laamanen

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (18)

Similar to IMPACT Final Conference - Majlis Bremer Laamanen

Similar to IMPACT Final Conference - Majlis Bremer Laamanen (20)

More from IMPACT Centre of Competence

More from IMPACT Centre of Competence (20)

Recently uploaded

Recently uploaded (20)

IMPACT Final Conference - Majlis Bremer Laamanen