Challenges for the Language Technology Industry

1,255 views

Published on

Presentation at the LT-Innovate Summit, Brussels, June 24-25 2014.
http://www.lt-innovate.eu/event/item/lt-innovate-summit-2014-brussels

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,255
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Les Miserables: Victor Hugo’s handwritten manuscripts: http://www.europeana.eu/portal/record/9200103/5372912AF66AB529E188218BC1F747E75EB1A18F.html
    BnF, public domain
    Matisse ‘53 in the form of a double helix’ http://www.europeana.eu/portal/record/9200104/F8D60AB9136C8A59B59DF1CFEC278A6CABA8B0C6.htmlThe Wellcome Library (CC-BY-NC-ND)
    ‘söprűtánc’ – Hungarian traditional dance http://www.europeana.eu/portal/record/08901/E1A7B01BE4AED87FD239672F4F3941F52262D6B2.html
    Hungarian Academy of Sciences Institute for Musicology, public domain
    ‘Neurologico reggae’ Music album http://www.europeana.eu/portal/record/08901/ADC241BCBF8470988DBA6EEAFCF13F14D88E5534.html
    DISMARC – EuropeanaConnect Paid Access
    ‘Castle of Kavala’ 3D exploration of a Greek castle http://www.europeana.eu/portal/record/2020703/05607B24D15BD516EE2B765F74CDA39C7427F7FB.html
    Cultural and Educational Technology Institute - Research Centre Athen CARARE CC-BY-NC-ND
  • All partners send us descriptions of their assets, which we aggregate in a single service
  • Germany 15.44%
    France 10.97%
    Netherlands 9.67%
    Sweden 9.44%
    Spain 9.98%
    UK 6.98%
    Norway 6.60%
    Italy 5.4%
    Ireland 4.04%
    Poland 4.02%
    Europe 3.95%
    Finland 2.95%
    Austria 2.05%
    Belgium 1.61%
    Hungary 1.26%
  • http://www.clef-initiative.eu/documents/71612/86374/CLEF2010wn-LogCLEF-StillerEt2010.pdf
  • Users from everywhere
    Data from everywhere
    Tools from everywhere
    http://europeana.eu/portal/record/2022347/B7C7D15C23C28EFD3FA25147ED3A580757CFBB04.html
    http://europeana.eu/portal/record/9200103/ark__12148_btv1b6921004c.html
  • Users from everywhere
    Data from everywhere
    Tools from everywhere
    http://europeana.eu/portal/record/2022347/B7C7D15C23C28EFD3FA25147ED3A580757CFBB04.html
    http://europeana.eu/portal/record/9200103/ark__12148_btv1b6921004c.html
    http://www.europeana.eu/portal/record/9200122/BibliographicResource_1000056116671.html
    http://www.europeana.eu/portal/record/2022608/DF_DF_13399.html
  • Users from everywhere
    Data from everywhere
    Tools from everywhere
    http://europeana.eu/portal/record/2022347/B7C7D15C23C28EFD3FA25147ED3A580757CFBB04.html
    http://europeana.eu/portal/record/9200103/ark__12148_btv1b6921004c.html
  • Challenges for the Language Technology Industry

    1. 1. Challenges for the LT Industry Antoine Isaac LT-Innovate Summit 2014 Brussels, June 25, 2014
    2. 2. Europe’s platform to access cultural heritage Currently 33M objects
    3. 3. Built on descriptive metadata from a broad, heterogeneous network Audiovisual collections National Aggregators Regional Aggregators Archives Thematic collections Libraries Musées Lausannois Culture.frThe European Library APEX European Film Gateway Europeana Fashion 2,300 galleries, museums, archives and libraries
    4. 4. Platform implies network
    5. 5. Accessing items from 36 countries top 16 Portal interface in 31 languages Metadata in 33 languages
    6. 6. Serving Europe’s citizens 5M visits on Europeana.eu 7M Facebook impressions API use…
    7. 7. Facilitating re-use on the language side? Our network needs automatic translation tools to address information needs all over Europe
    8. 8. Gathering/linking existing multilingual data
    9. 9. Related projects applying NLP tools E.g. a project (PATHS) developed techniques to enrich English and Spanish collections 1)Identification of key entities 2)Detection of (typed) similarities between objects, using metadata 3)“Background links” to external resources such as Wikipedia 4)Classification of object against a hierarchy of topics Applying these to other languages would require work 1)-> requires language-specific tools (PoS tagging, lemmatization) 2)-> straightforward to apply to new languages 3)-> requires language-specific tools 4)-> depends on (3) and on translation of some topics http://www.paths-project.eu/eng/Resources/Semantic-Enrichment-of-Cultural-Heritage-content-in-PATHS
    10. 10. Language challenges for Digital Libraries  Typical queries are very short Average < 2 terms  Identification of query language is not easy, even manually 39% of queries may belong to several languages  Plenty of named entities 60% of queries are for persons & places Not only is it hard for queries: the same issues apply to the descriptive metadata Studies by Humboldt University on Europeana and The European Library http://www.clef-initiative.eu/documents/71612/86374/CLEF2010wn-LogCLEF-StillerEt2010.pdf
    11. 11. Language issues at the scale of Europe
    12. 12. Very diverse domains, probably with few training corpora available Tools, UCL Museums, CC-BY-NC-SA Paris, nouvelle machine à paver : [photographie de presse] / [Agence Rol], National Library of France, Public Domain St. Philip holding a book and St. James (the Less?) holding a book, National Library of the Netherlands, Public domain La paloma / O sole mio, Dalane Folkemuseum, CC0
    13. 13. Relevant LT can come from everywhere in Europe, raising interoperability issues
    14. 14. Resource problem Both for us and our partners - libraries, archives, museums  Not much money  Few technical experts  Emphasis on open source technology We can provide interesting challenges for the industry in terms of (open) data availability, users and scenarios. But we're not (yet) a market of the size of others
    15. 15. Thank you! Antoine Isaac antoine.isaac@europeana.eu @EuropeanaEU

    ×