Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Georg Rehm, Felix Sasaki, Aljoscha Burchardt
DFKI GmbH – Language Technology Lab, Berlin
Web Annotations
A Game Changer fo...
Language Technology
•  Language Technology is a heterogeneous and evolving
set of applications that involve the
–  (semi-)...
Selected LT Applications
Spell checking, grammar checking
Search engines (IR)
Interactive personal assistants (Cortana, Si...
Web Annotations and Language Technology – I Annotate 2016
Web Annotation Architecture
Web annotation architecture
http://w...
Web Annotations and Language Technology – I Annotate 2016
Web Annotation Architecture
Content could be created by Language...
Web Annotations and Language Technology – I Annotate 2016
Web Annotation Architecture
Content could be analysed by
Languag...
Web Annotations and Language Technology – I Annotate 2016
Web Annotation Architecture
Especially in Social Media Analytics...
Web Annotations and Language Technology – I Annotate 2016
Web Annotation Architecture
•  Today, analysing UGC is difficult
...
Web Annotations and Language Technology – I Annotate 2016
Web Annotation Architecture
We can also use Language Technology
...
LT and Web Annotations
•  Analysis of web annotations and making use of web
annotations through Language Technology:
–  Ar...
Example Scenarios
•  Two example scenarios to demonstrate how Language
Technology and Web Annotations go together.
•  Scen...
language and knowledge technologies
curation technologies
sector-specific technologies
platformtechnologies
sector-specific ...
Information
Information
Information
Information
Information
Information
Information
Information
Information
? ??
?Informat...
Sectors
Input Processes Software Output
tweet analyse text processor newspaper article
newspaper article select presentati...
Web Annotations and Language Technology – I Annotate 2016
Structure visualisation
Multilingual multimedia sources
Crossmed...
platform for digital curation technologies
broker REST API
curation service 1
language or knowledge
technology
curation se...
platform for digital curation technologies
broker REST API
curation service 1
language or knowledge
technology
curation se...
Input
Web Annotations and Language Technology – I Annotate 2016 18
Output
Mean dates
Intervals
JSON-LD representation
Web Annotations for HQMT
•  Current MT research workflows use several specialised and
incompatible tools and distributed re...
Multidimensional Quality Metrics
MQM for MT diagnostics
•  Customisable framework for translation quality metrics
•  Early...
From MQM to Web Annotations
Web Annotation
(intermediate XML syntax)
Proprietary and tool-specific CSV
MQM issue type
https...
Web Annotation Infrastructure
•  Web annotations themselves work on language.
•  Language Technology could help build bett...
Vision 2020
•  Next generation personal assistant.
•  Highly personalised, assisted browsing experience.
•  Semantic langu...
So, are Web Annotations a game changer
for Language Technology?
Yes, most certainly – if the UX and
browser support are do...
Thank you!
Web Annotations and Language Technology – I Annotate 2016 26
supported by supported by
Beyond Multilingual Euro...
Upcoming SlideShare
Loading in …5
×

Web Annotations – A Game Changer for Language Technology?

284 views

Published on

Georg Rehm, Felix Sasaki, and Aljoscha Burchardt. Web Annotations - A Game Changer for Language Technologies? I Annotate 2016, Berlin, Germany, May 2016. May 19/20, 2016.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Web Annotations – A Game Changer for Language Technology?

  1. 1. Georg Rehm, Felix Sasaki, Aljoscha Burchardt DFKI GmbH – Language Technology Lab, Berlin Web Annotations A Game Changer for Language Technologies?
  2. 2. Language Technology •  Language Technology is a heterogeneous and evolving set of applications that involve the –  (semi-)automatic processing (analysis) or –  (semi-)automatic production of human language (written or spoken). •  Driven by NLP, CL, Linguistics, CompSci, CogSci, AI. •  Methods operate on language data (often web-scale) •  Rule-based tools, statistics (machine learning) •  Need for human experts to analyse and annotate data sets with highly specialised linguistic analysis information Web Annotations and Language Technology – I Annotate 2016 2
  3. 3. Selected LT Applications Spell checking, grammar checking Search engines (IR) Interactive personal assistants (Cortana, Siri etc.) Machine Translation Recommender systems Social media (analytics, streams) Knowledge-based systems Web Annotations and Language Technology – I Annotate 2016 3
  4. 4. Web Annotations and Language Technology – I Annotate 2016 Web Annotation Architecture Web annotation architecture http://www.w3.org/annotation What is the relationship between Web Annotations and Language Technology? 4
  5. 5. Web Annotations and Language Technology – I Annotate 2016 Web Annotation Architecture Content could be created by Language Technology fully automatically or in a semi-automatic way (text generation). 5
  6. 6. Web Annotations and Language Technology – I Annotate 2016 Web Annotation Architecture Content could be analysed by Language Technology (semantic analysis, input for ML algorithms etc.) 6
  7. 7. Web Annotations and Language Technology – I Annotate 2016 Web Annotation Architecture Especially in Social Media Analytics we are very interested in UGC, i.e., in comments, feedback – “what do users think of a certain product?“ etc. 7
  8. 8. Web Annotations and Language Technology – I Annotate 2016 Web Annotation Architecture •  Today, analysing UGC is difficult and costly (many heterogeneous sources, many different formats). •  A few established and widely used Web Annotation services would simplify SMA dramatically! 8
  9. 9. Web Annotations and Language Technology – I Annotate 2016 Web Annotation Architecture We can also use Language Technology methods to create (or help create) annotations, for example, in a smart authoring scenario. 9
  10. 10. LT and Web Annotations •  Analysis of web annotations and making use of web annotations through Language Technology: –  Arbitrary web annotations (i.e., unstructured text) •  No more crawling, aggregating, mapping! –  Dedicated LT-specific web annotations •  Annotating language data without any specialised stand-alone tools or data repositories! •  Generation of web annotations through Language Technology (e.g., to provide background information on important content – see, e.g., the Pundit use cases). Web Annotations and Language Technology – I Annotate 2016 10
  11. 11. Example Scenarios •  Two example scenarios to demonstrate how Language Technology and Web Annotations go together. •  Scenario 1 – Digital Curation Technologies:
 Semantification of content for curators of digital information •  Scenario 2 – Machine Translation:
 Web Annotations for High-Quality Machine Translation Web Annotations and Language Technology – I Annotate 2016 11
  12. 12. language and knowledge technologies curation technologies sector-specific technologies platformtechnologies sector-specific solutions ! Digital Curation Technologies •  Support curation processes through sophisticated language and knowledge technologies. •  Goal: transfer of these technologies into industry through platform for digital curation technologies. Web Annotations and Language Technology – I Annotate 2016 12
  13. 13. Information Information Information Information Information Information Information Information Information ? ?? ?Information OutputInput SoftwareProcesses Web Annotations and Language Technology – I Annotate 2016 13 •  Investigative journalist •  Curator of an exhibition •  TV editor •  Author •  Scholar •  Knowledge worker •  Curator of digital information
  14. 14. Sectors Input Processes Software Output tweet analyse text processor newspaper article newspaper article select presentation multimedia website wire copy focus spreadsheet tv report facebook status update revise email exhibition catalogue search result read up on browser mobile application email write groupware mashup (e.g., map) text message create sector-specific application text piece concept research CMS concept text file assess ECMS timeline video evaluate CRM study map arrange enterprise software presentation stockphoto sort graphics/layouting software fact collection in-house database structure IP telephony description of an exhibit calendar entry summarise etc. analysis spreadsheet shorten etc. archive translate etc. catch up on combine abstract integrate visualise generate annotate reference etc. Information Information Information Information Information Information Information Information Information ? ?? ?Information OutputInput SoftwareProcesses
  15. 15. Web Annotations and Language Technology – I Annotate 2016 Structure visualisation Multilingual multimedia sources Crossmedia recommendations Multilingual summarisation Event timelining Semantification of content Multilingual sentiment analysis Semantic story-telling Ontology-based knowledge structures 15 Curation Processes
  16. 16. platform for digital curation technologies broker REST API curation service 1 language or knowledge technology curation service 2 language or knowledge technology client using 
 the API external service 1 external service 2 client using 
 the API client using 
 the API client using 
 the API pipelined curation workflow Web Annotations and Language Technology – I Annotate 2016 16
  17. 17. platform for digital curation technologies broker REST API curation service 1 language or knowledge technology curation service 2 language or knowledge technology client using 
 the API external service 1 external service 2 client using 
 the API client using 
 the API client using 
 the API pipelined curation workflow •  Annotation of time expressions – needed for visualisation of time-lining •  Input: text content – output: list of time expressions and mean dates •  Storage using the Web Annotation model •  http://dkt-projekt.github.io/webAnnotation/webannotation-dkt.html Example Web Annotations and Language Technology – I Annotate 2016 17
  18. 18. Input Web Annotations and Language Technology – I Annotate 2016 18
  19. 19. Output Mean dates Intervals JSON-LD representation
  20. 20. Web Annotations for HQMT •  Current MT research workflows use several specialised and incompatible tools and distributed repositories. •  Ideal scenario: one coherent, 
 interoperable and integrated 
 ecosystem of tools. •  Centrally stored web 
 annotations would be 
 a massive step in the 
 right direction! Web Annotations and Language Technology – I Annotate 2016 20 http://www.cracking-the-language-barrier.eu/mt-eval-workshop-2016/ - Ranking - Post-Editing - Error Annotation (MQM) - Task based Evaluation Human Evaluation - Sampling - Filtering - Translation Memory Inclusion - Terminology Checking Translation Production Workflows - Tokeinisation - POS tagging - Parsing - Entity recognition - WSD Linguistic Analysis - Services - Development Machine Translation - BLEU - Quality Estimation - PE-Distance - Test-Suites Automatic Evaluation REPOSITORY COCKPIT BACKEND DATA SETS META-SHARE WMT JRC CLARIN
  21. 21. Multidimensional Quality Metrics MQM for MT diagnostics •  Customisable framework for translation quality metrics •  Early version standardised in W3C’s ITS 2.0 21 •  Annotations in current workflows are typically proprietary, tool-, format- and workflow-based. •  Web annotations could enable the creation of a collaborative corpus of translation data for the whole community. •  Feedback into MT engines through annotated web-scale corpora could lead to a boost in performance and quality. •  Next slide: conversion of proprietary tool format to Web Annotations.
  22. 22. From MQM to Web Annotations Web Annotation (intermediate XML syntax) Proprietary and tool-specific CSV MQM issue type https://github.com/dkt-projekt/webAnnotation/tree/gh-pages/mqm-webannotation
  23. 23. Web Annotation Infrastructure •  Web annotations themselves work on language. •  Language Technology could help build better services. •  Anchoring annotations to changing content in a robust way is apparently tricky. •  Semantic methods for identifying the new position of the original anchors that have changed since the annotation was put there. •  Annotating all copies of the document that is currently being annotated – application of methods for duplicate detection or near duplicate detection. Web Annotations and Language Technology – I Annotate 2016 23
  24. 24. Vision 2020 •  Next generation personal assistant. •  Highly personalised, assisted browsing experience. •  Semantic language technologies in the background. •  Detection of the user‘s tasks, intentions, preferences. •  Annotation of relevant, surprising, new facts in current and future content through web annotations. •  Anticipation of the user’s next steps. •  Suggestion of related content based on 
 user modelling and semantic story telling. Web Annotations and Language Technology – I Annotate 2016 24 Georg Rehm and Hans Uszkoreit (eds.). The META-NET Strategic Research Agenda for Multilingual Europe 2020. Springer, 2013; see Priority Research Theme “Socially-Aware Interactive Assistant”.
  25. 25. So, are Web Annotations a game changer for Language Technology? Yes, most certainly – if the UX and browser support are done right. Maybe Language Technology can also be a game changer for Web Annotations. Web Annotations and Language Technology – I Annotate 2016 25
  26. 26. Thank you! Web Annotations and Language Technology – I Annotate 2016 26 supported by supported by Beyond Multilingual Europe 04/05 July, 2016 – Lisbon, Portugal http://www.meta-forum.eu Deadline for submissions: 29 May 2016

×