De conferentie 2012 - CLARIN

488 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
488
On SlideShare
0
From Embeds
0
Number of Embeds
75
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • (Quantitative Analysis of Culture Using Millions of Digitized Books (J.-B. Michel et al, 2010, Science DOI: 10.1126/science.1199644)
  • Automatischeinterruptieanalyse: welkepartijinterrupeerdewelkepartij hoe vaak (Maarten Marx, UvA)
  • De conferentie 2012 - CLARIN

    1. 1. Language Resources and Technology Infrastructure for the Humanities and the Social Sciences in the Netherlands Arjan van Hessen
    2. 2. State of the Technology Language and Speech Technology is (nearly) mature Many applications are available Most of it is usable (although not perfect) but…..
    3. 3. Unused Technology & ResourcesLack of standardization Many scholars are is killing not aware of the HLT & Resources It is less used than expected A-priori technicalknowledge still necessary Use it to much dependent of “friends” in the field
    4. 4. Research Life cycle New Idea Publications Research ? Tuning Building Cultural Heritage Institution(s)
    5. 5. Unused Technology & Resources CAR
    6. 6. HLT & CHI paths Language processing Machine learning CATCH Cultural Heritage InstitutionsHumaninities
    7. 7. After the project 7
    8. 8. CLARIN-EU (2007-2012) CLARIN-NL (2009-2015) CLARIN-ERIC (2012-xxxx) CLARIAH (2015-…) Infrastructure program for the Humanities 8
    9. 9. Issues to address1. Finding the users2. Identification of their needs/problems3. Do our solutions correspond to their problems?4. Usability of tools: can they use them?5. Visualisation6. Tutorials and web material (movies, courses)7. Sustainability of tools and resources 9
    10. 10. How to identify and convince potential users1. FINDING THE USERS 10
    11. 11. 11Humanities enter a New EraHuge amounts of digital data are becoming available Hardware allows this Traditionally, Spitzweg’s and many tools are “lonelysupported bylonger Big data, scholar” no available and under automatedsuffices methods development
    12. 12. User Surveys Go out to ask potential users  User survey in the Netherlands (2010) 12
    13. 13. What do they need?2. IDENTIFICATION OF THEIR NEEDS/PROBLEMS 13
    14. 14. User attraction cycle Finding new users Convincing these Listening to users to the users participate Support the Train these users in the use of users all those wonderful tools 14
    15. 15. What to prevent in order to NOT scare off (potential) users3. DO OUR SOLUTIONS CORRESPOND TO THEIR PROBLEMS? 15
    16. 16. The CLARIN dream Give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350) Give me all negative articles about Catholics in the Fryske Courant (1868-1924) Find European TV news interviews that involve discussions about Geert Wilders 16 16
    17. 17. The CLARIN nightmare in 6sleepless nights – night 1 Give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350)  “All” means from all countries and all archives, not just some archives in some (9) countries that happen to be in CLARIN  If contemporary docs exist in digital form at all they are probably pictures – how do we get access to the content?  Can we rely on standardized metadata to find them?  Many of the docs may be in Latin – can we handle that, and what about the other languages?  How would a scholar know how to formulate this query?  How to present results? 17
    18. 18. The gearbox syndrome4. USABILITY OF TOOLS 18
    19. 19. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution First HLT researcher offering help 19
    20. 20. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution First generation named entity recognizer (rule based) 20
    21. 21. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution Second HLT researcher offering help 21
    22. 22. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution Second generation named entity recognizer (statistics based) 22
    23. 23. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution Third HLT researcher offering help 23
    24. 24. The gearbox syndrome explained Humanities scholar with a problem, waiting for a solution LREC 2012 paper about next generation named entity recognizer 24
    25. 25. The gearbox syndrome explained 25
    26. 26. Making understandable interfaces
    27. 27. A picture says more than 1000 wordsEasy visualization fosters data analysisNice visualisation eases use of analysis toolsNice-to-look-at tools help to reach out to the community5. VISUALIZATION 27
    28. 28. Who answered which words: visualizing word frequency information in lettersC. Culy. 2012. "Somechallenges oflanguage andlinguistic data forinformationvisualization. " Invitedkeynote presentationat Advanced VisualMethods forLinguistics. Universityof York, September 7,2012. 28
    29. 29. 29
    30. 30. 30
    31. 31. Parliamentary DebateWhich party interrupted which other party and how often? 31
    32. 32. Create and publish web tutorialsPublish recorded lectures about CLARIN-specific topicsMake and publish show cases6. TUTORIALS AND WEB MATERIAL 32
    33. 33. Web-video’s 33
    34. 34. Showcases 34
    35. 35. Resources and tools must be accessible after a project finishesData and tools must use international accepted standardsEasy access via federated login7. SUSTAINABILITY OF TOOLS AND RESOURCES 35
    36. 36. CLARIN Centres 36
    37. 37. Conclusion CLARIN offers a good and sustainable infrastructure for long-term use of both Resources and Tools Participating in CLARIN gives you access to enclosure tools, standardized metadata, tools for metadata, the CLARIN community Give other groups/institutions access to your data….. If you want 37
    38. 38. So join us!www.clarin.nlTHANK YOU! 38

    ×