• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
IMPACT Final Conference - Majlis Bremer Laamanen
 

IMPACT Final Conference - Majlis Bremer Laamanen

on

  • 992 views

Crowdsourcing in the DigitalKoot project with Majlis Bremner-Laamanen from the National Library of Finland.

Crowdsourcing in the DigitalKoot project with Majlis Bremner-Laamanen from the National Library of Finland.

Statistics

Views

Total Views
992
Views on SlideShare
730
Embed Views
262

Actions

Likes
0
Downloads
15
Comments
0

2 Embeds 262

http://impactocr.wordpress.com 261
http://www.slashdocs.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    IMPACT Final Conference - Majlis Bremer Laamanen IMPACT Final Conference - Majlis Bremer Laamanen Presentation Transcript

    • CROWDSOURCING IN THE DIGITALKOOT PROJECT Majlis Bremer-Laamanen IMPACT 24TH OF OCTOBER, 2011 Microtask.com:Digitalkoot: Making Old Archives Accessible Using Crowdsourcing by Otto Chrons and Sami Sundell, Discussions Managing Director Harri Holopainen harri@microtask.com
    • The Centre for Preservation and Digitisation: statistics• Established in 1990 • Digitisation: 1,3• Digitisation started in million pages 1998 • Audio digitisation• Over 50 employees and cataloguing music 1,300 unique• Yearly average (past cassettes and the three years): sleeves • Microfilm • Conservation: production: 1, 3 10,000-15,000 units million exposures
    • ENRICHING CONTENT (http://digi.nationallibrary.fi, http://www.doria.fi/handle/10024/4194)• Newspapers - > 2 million pages, the Historical Newspaper Library• Journals - > 2,7 million pages, free to 1910, in all legal depositlibraries to 1944• Books - > travel, novels, Dissertations 17th century, Save the Book• Ephemera - > industrial price lists• Sound - > national sound archive, C-casettes• Interest groups: the creators, users, contributors of the material
    • Context for mass digitisation and crowdsourcing ClientAccessibility Centre for Preservation and Digitisation Temporary Physical Preparation for Post- storage for Digitisation objectsTransferring Digitisation processing digitised objects Retrieval Physical Objects Mass digitisation activities in the most cost-effective manner: Newspapers, books, journals, ephemera, audio: • Logistics for physical items • Process for digital objects: network services and long-term preservation • Metadata Mets - Alto: capturing through process • Metadata development: User experience and crowdsourcing • Customizing of the tracking systems (CCS, Item Tracking, Scan Client) • Operational environment: scaling architecture and implementation
    • DIGITALKOOTDIGI = TO DIGITISETALKOOT = PEOPLE GATHERING TO WORK TOGETHERVOLUNTARILY (WITHOUT PAYMENT)FIRST EXPERIENCE 2011:DIGITALKOOT: correction of OCR by gamification, turning usefulactivities into games ”THE MOLE HUNT” by Microtask.com. – People can spend hours on games – Turning useful activities into games – Activities can be rewarded with scores, achievments and social benefitsFrom February, 8th to September 15th, 2011: about 80.000visitors, 4000 hours of effective game time. More than 5 milliontasks.
    • CHALLENGESMeaningful tasks without breaking the flow of the gameReal-time feedback – many simultaneous players doingthe same taskBuild a bridge to save the moles from falling down => – Correct typing gives you a block to the bridge – Incorrect is punished by explosion
    • DIGITALKOOT: Mole Hunt
    • Right or wrong?
    • DIGITALKOOT: Mole Bridge
    • A bridge has been built…
    • To the next level?
    • Changing sceneries
    • When a mole falls
    • Incorrect answer exploding
    • GAMIFICATION CHALLENGESBalancing game play elements with task completion speed andaccuracyKeep the motivation of people and enlarge the audienceIntroduction of meaningful tasks into the game without breakinggame play mechanismsInstant feedback on players´ actions (simultaneous players)•pressure to adapt to varying feedback situations/latencities
    • POSITIVE EFFECT OF VERIFICATION”The wisdom of the crowds” • includes answers from possible spammersGame start: verification tasks onlyAccurate work shown => verification lowered in phases, never zeroVerification tasks are created automatically: • A randomly selected task is sent to several players: all have to agree on the result => verification task
    • VERIFICATION OF THE OCRPlayers and their pace cannot be synchronized.Verification tasks to the task stream:•Fed to players varies according to the number of active players•The system knows the answer: the game play is improved by fastfeedback•Downside: no new information produced
    • USERS: February 8th to March 31st, 201131,816 visitors, 4,768 players, 2,740 hours of game time, 2,5 milliontasks.1 % via Internet, 99 % via FacebookHalf of the users were men.Gametime: seconds to over 100 hours (altogether).Median time: => 9 minutes.Women >13 minutes and 54 % of the tasksHardest working top 4 were all men
    • ACCURACYOCR-system 0.8 confidential about accuracy => human correction in 30%Random selection of 2 articles:•1,467 words Digitalkoot result: only14 mistakes /228 OCR•516 words Digitalkoot result: 1 mistake/118 OCR•>> well over 99% possible by gamificationSpammer play: •One player 1,5 hours and 5,692 tasks was detected by the verification system and only 4 tasks were accepted
    • Enriching Digitisation Production Processes, METS Profiles: a new development platform RESOURCE DIGITAL Articles Illustrations COMPREHENSIVE Poems LEVEL OF DIGITAL COLLECTIONS MARK UP Standards & OAI-PMH Structural metadata METS, ALTO complient METS SIP POST packages PROCESSING METS EXPORT Administrative/technical metadata MIX/PREMIS Packesges include: SCANNING JPEG2000 Descriptive metadata MARC21/MODS OCR TXT as ALTO XML PDF CATALOGUING Two BibliographicNewspapers Records JPEG(150)Serials METSXMLBooksParchments MARCXMLNotesMaps SOURCE MATERIALAudio PHYSICAL COLLECTIONS
    • IN THE MEDIA-Until March 31st, over 30 articles: all around the world: New YorkTimes…-Television appearances ongoing-Helsingin Sanomat : HS talkoot using the National Library´sdigitised newspaper material Historical Newspaper Library >advertising Digitalkoot e.g. September 15th-Influenced user interest => stabilisation to 300 individual users per week
    • NEXT1) Marking of articles and/or images2) Indexing articles and/or images
    • KUVATALKOOTGoal: sophisticateduser experienceCollections discovery and Luonnon-kirja ala-alkeiskouluin tarpeeksi / Z. Topelius, 1868reuse of digital content byresearchers and people atlarge: Researchers will get better systematic coverage of images and articles in published printed material.