Your SlideShare is downloading. ×
De conferentie 2012 - CLARIN
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

De conferentie 2012 - CLARIN

184
views

Published on


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
184
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • (Quantitative Analysis of Culture Using Millions of Digitized Books (J.-B. Michel et al, 2010, Science DOI: 10.1126/science.1199644)
  • Automatischeinterruptieanalyse: welkepartijinterrupeerdewelkepartij hoe vaak (Maarten Marx, UvA)
  • Transcript

    • 1. Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands Arjan van Hessen
    • 2. State of the Technology Language and Speech Technology is (nearly) mature Many applications are available Most of it is usable (although not perfect) but…..
    • 3. Unused Technology & ResourcesLack of standardization Many scholars are is killing not aware of the HLT & Resources It is less used than expected A-priori technicalknowledge still necessary Use it to much dependent of “friends” in the field
    • 4. Research Life cycle New Idea Publications Research ? Tuning Building Cultural Heritage Institution(s)
    • 5. Unused Technology & Resources CAR
    • 6. HLT & CHI paths Language processing Machine learning CATCH Cultural Heritage InstitutionsHumaninities
    • 7. After the project 7
    • 8. CLARIN-EU (2007-2012) CLARIN-NL (2009-2015) CLARIN-ERIC (2012-xxxx) CLARIAH (2015-…) Infrastructure program for the Humanities 8
    • 9. Issues to address1. Finding the users2. Identification of their needs/problems3. Do our solutions correspond to their problems?4. Usability of tools: can they use them?5. Visualisation6. Tutorials and web material (movies, courses)7. Sustainability of tools and resources 9
    • 10. How to identify and convince potential users1. FINDING THE USERS 10
    • 11. 11Humanities enter a New EraHuge amounts of digital data are becoming available Hardware allows this Traditionally, Spitzweg’s and many tools are “lonelysupported bylonger Big data, scholar” no available and under automatedsuffices methods development
    • 12. User Surveys Go out to ask potential users  User survey in the Netherlands (2010) 12
    • 13. What do they need?2. IDENTIFICATION OF THEIR NEEDS/PROBLEMS 13
    • 14. User attraction cycle Finding new users Convincing these Listening to users to the users participate Support the Train these users in the use of users all those wonderful tools 14
    • 15. What to prevent in order to NOT scare off (potential) users3. DO OUR SOLUTIONS CORRESPOND TO THEIR PROBLEMS? 15
    • 16. The CLARIN dream Give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350) Give me all negative articles about Catholics in the Fryske Courant (1868-1924) Find European TV news interviews that involve discussions about Geert Wilders 16 16
    • 17. The CLARIN nightmare in 6sleepless nights – night 1 Give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350)  “All” means from all countries and all archives, not just some archives in some (9) countries that happen to be in CLARIN  If contemporary docs exist in digital form at all they are probably pictures – how do we get access to the content?  Can we rely on standardized metadata to find them?  Many of the docs may be in Latin – can we handle that, and what about the other languages?  How would a scholar know how to formulate this query?  How to present results? 17
    • 18. The gearbox syndrome4. USABILITY OF TOOLS 18
    • 19. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution First HLT researcher offering help 19
    • 20. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution First generation named entity recognizer (rule based) 20
    • 21. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution Second HLT researcher offering help 21
    • 22. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution Second generation named entity recognizer (statistics based) 22
    • 23. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution Third HLT researcher offering help 23
    • 24. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution LREC 2012 paper about next generation named entity recognizer 24
    • 25. The gearbox syndrome explained 25
    • 26. Making understandable interfaces
    • 27. A picture says more than 1000 wordsEasy visualization fosters data analysisNice visualisation eases use of analysis toolsNice-to-look-at tools help to reach out to the community5. VISUALIZATION 27
    • 28. Who answered which words: visualizing word frequency information in lettersC. Culy. 2012. "Somechallenges oflanguage andlinguistic data forinformationvisualization. " Invitedkeynote presentationat Advanced VisualMethods forLinguistics. Universityof York, September 7,2012. 28
    • 29. 29
    • 30. 30
    • 31. Parliamentary DebateWhich party interrupted which other party and how often? 31
    • 32. Create and publish web tutorialsPublish recorded lectures about CLARIN-specific topicsMake and publish show cases6. TUTORIALS AND WEB MATERIAL 32
    • 33. Web-video’s 33
    • 34. Showcases 34
    • 35. Resources and tools must be accessible after a project finishesData and tools must use international accepted standardsEasy access via federated login7. SUSTAINABILITY OF TOOLS AND RESOURCES 35
    • 36. CLARIN Centres 36
    • 37. Conclusion CLARIN offers a good and sustainable infrastructure for long-term use of both Resources and Tools Participating in CLARIN gives you access to enclosure tools, standardized metadata, tools for metadata, the CLARIN community Give other groups/institutions access to your data….. If you want 37
    • 38. So join us!www.clarin.nlTHANK YOU! 38