De conferentie 2012 - CLARIN
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
343
On Slideshare
288
From Embeds
55
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 55

http://www.den.nl 55

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • (Quantitative Analysis of Culture Using Millions of Digitized Books (J.-B. Michel et al, 2010, Science DOI: 10.1126/science.1199644)
  • Automatischeinterruptieanalyse: welkepartijinterrupeerdewelkepartij hoe vaak (Maarten Marx, UvA)

Transcript

  • 1. Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands Arjan van Hessen
  • 2. State of the Technology Language and Speech Technology is (nearly) mature Many applications are available Most of it is usable (although not perfect) but…..
  • 3. Unused Technology & ResourcesLack of standardization Many scholars are is killing not aware of the HLT & Resources It is less used than expected A-priori technicalknowledge still necessary Use it to much dependent of “friends” in the field
  • 4. Research Life cycle New Idea Publications Research ? Tuning Building Cultural Heritage Institution(s)
  • 5. Unused Technology & Resources CAR
  • 6. HLT & CHI paths Language processing Machine learning CATCH Cultural Heritage InstitutionsHumaninities
  • 7. After the project 7
  • 8. CLARIN-EU (2007-2012) CLARIN-NL (2009-2015) CLARIN-ERIC (2012-xxxx) CLARIAH (2015-…) Infrastructure program for the Humanities 8
  • 9. Issues to address1. Finding the users2. Identification of their needs/problems3. Do our solutions correspond to their problems?4. Usability of tools: can they use them?5. Visualisation6. Tutorials and web material (movies, courses)7. Sustainability of tools and resources 9
  • 10. How to identify and convince potential users1. FINDING THE USERS 10
  • 11. 11Humanities enter a New EraHuge amounts of digital data are becoming available Hardware allows this Traditionally, Spitzweg’s and many tools are “lonelysupported bylonger Big data, scholar” no available and under automatedsuffices methods development
  • 12. User Surveys Go out to ask potential users  User survey in the Netherlands (2010) 12
  • 13. What do they need?2. IDENTIFICATION OF THEIR NEEDS/PROBLEMS 13
  • 14. User attraction cycle Finding new users Convincing these Listening to users to the users participate Support the Train these users in the use of users all those wonderful tools 14
  • 15. What to prevent in order to NOT scare off (potential) users3. DO OUR SOLUTIONS CORRESPOND TO THEIR PROBLEMS? 15
  • 16. The CLARIN dream Give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350) Give me all negative articles about Catholics in the Fryske Courant (1868-1924) Find European TV news interviews that involve discussions about Geert Wilders 16 16
  • 17. The CLARIN nightmare in 6sleepless nights – night 1 Give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350)  “All” means from all countries and all archives, not just some archives in some (9) countries that happen to be in CLARIN  If contemporary docs exist in digital form at all they are probably pictures – how do we get access to the content?  Can we rely on standardized metadata to find them?  Many of the docs may be in Latin – can we handle that, and what about the other languages?  How would a scholar know how to formulate this query?  How to present results? 17
  • 18. The gearbox syndrome4. USABILITY OF TOOLS 18
  • 19. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution First HLT researcher offering help 19
  • 20. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution First generation named entity recognizer (rule based) 20
  • 21. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution Second HLT researcher offering help 21
  • 22. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution Second generation named entity recognizer (statistics based) 22
  • 23. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution Third HLT researcher offering help 23
  • 24. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution LREC 2012 paper about next generation named entity recognizer 24
  • 25. The gearbox syndrome explained 25
  • 26. Making understandable interfaces
  • 27. A picture says more than 1000 wordsEasy visualization fosters data analysisNice visualisation eases use of analysis toolsNice-to-look-at tools help to reach out to the community5. VISUALIZATION 27
  • 28. Who answered which words: visualizing word frequency information in lettersC. Culy. 2012. "Somechallenges oflanguage andlinguistic data forinformationvisualization. " Invitedkeynote presentationat Advanced VisualMethods forLinguistics. Universityof York, September 7,2012. 28
  • 29. 29
  • 30. 30
  • 31. Parliamentary DebateWhich party interrupted which other party and how often? 31
  • 32. Create and publish web tutorialsPublish recorded lectures about CLARIN-specific topicsMake and publish show cases6. TUTORIALS AND WEB MATERIAL 32
  • 33. Web-video’s 33
  • 34. Showcases 34
  • 35. Resources and tools must be accessible after a project finishesData and tools must use international accepted standardsEasy access via federated login7. SUSTAINABILITY OF TOOLS AND RESOURCES 35
  • 36. CLARIN Centres 36
  • 37. Conclusion CLARIN offers a good and sustainable infrastructure for long-term use of both Resources and Tools Participating in CLARIN gives you access to enclosure tools, standardized metadata, tools for metadata, the CLARIN community Give other groups/institutions access to your data….. If you want 37
  • 38. So join us!www.clarin.nlTHANK YOU! 38