SlideShare a Scribd company logo
Georg Rehm, Felix Sasaki, Aljoscha Burchardt
DFKI GmbH – Language Technology Lab, Berlin
Web Annotations
A Game Changer for Language Technologies?
Language Technology
•  Language Technology is a heterogeneous and evolving
set of applications that involve the
–  (semi-)automatic processing (analysis) or
–  (semi-)automatic production
of human language (written or spoken).
•  Driven by NLP, CL, Linguistics, CompSci, CogSci, AI.
•  Methods operate on language data (often web-scale)
•  Rule-based tools, statistics (machine learning)
•  Need for human experts to analyse and annotate data
sets with highly specialised linguistic analysis information
Web Annotations and Language Technology – I Annotate 2016 2
Selected LT Applications
Spell checking, grammar checking
Search engines (IR)
Interactive personal assistants (Cortana, Siri etc.)
Machine Translation
Recommender systems
Social media (analytics, streams)
Knowledge-based systems
Web Annotations and Language Technology – I Annotate 2016 3
Web Annotations and Language Technology – I Annotate 2016
Web Annotation Architecture
Web annotation architecture
http://www.w3.org/annotation
What is the relationship between
Web Annotations
and Language Technology?
4
Web Annotations and Language Technology – I Annotate 2016
Web Annotation Architecture
Content could be created by Language
Technology fully automatically or in a
semi-automatic way (text generation).
5
Web Annotations and Language Technology – I Annotate 2016
Web Annotation Architecture
Content could be analysed by
Language Technology (semantic
analysis, input for ML algorithms etc.)
6
Web Annotations and Language Technology – I Annotate 2016
Web Annotation Architecture
Especially in Social Media Analytics we
are very interested in UGC, i.e., in
comments, feedback – “what do users
think of a certain product?“ etc.
7
Web Annotations and Language Technology – I Annotate 2016
Web Annotation Architecture
•  Today, analysing UGC is difficult
and costly (many heterogeneous
sources, many different formats).
•  A few established and widely used
Web Annotation services would
simplify SMA dramatically!
8
Web Annotations and Language Technology – I Annotate 2016
Web Annotation Architecture
We can also use Language Technology
methods to create (or help create)
annotations, for example, in a smart
authoring scenario.
9
LT and Web Annotations
•  Analysis of web annotations and making use of web
annotations through Language Technology:
–  Arbitrary web annotations (i.e., unstructured text)
•  No more crawling, aggregating, mapping!
–  Dedicated LT-specific web annotations
•  Annotating language data without any specialised
stand-alone tools or data repositories!
•  Generation of web annotations through Language
Technology (e.g., to provide background information on
important content – see, e.g., the Pundit use cases).
Web Annotations and Language Technology – I Annotate 2016 10
Example Scenarios
•  Two example scenarios to demonstrate how Language
Technology and Web Annotations go together.
•  Scenario 1 – Digital Curation Technologies:

Semantification of content for curators of digital information
•  Scenario 2 – Machine Translation:

Web Annotations for High-Quality Machine Translation
Web Annotations and Language Technology – I Annotate 2016 11
language and knowledge technologies
curation technologies
sector-specific technologies
platformtechnologies
sector-specific solutions
!
Digital Curation Technologies
•  Support curation processes through sophisticated
language and knowledge technologies.
•  Goal: transfer of these technologies into industry
through platform for digital curation technologies.
Web Annotations and Language Technology – I Annotate 2016 12
Information
Information
Information
Information
Information
Information
Information
Information
Information
? ??
?Information
OutputInput SoftwareProcesses
Web Annotations and Language Technology – I Annotate 2016 13
•  Investigative journalist
•  Curator of an exhibition
•  TV editor
•  Author
•  Scholar
•  Knowledge worker
•  Curator of digital information
Sectors
Input Processes Software Output
tweet analyse text processor newspaper article
newspaper article select presentation multimedia website
wire copy focus spreadsheet tv report
facebook status update revise email exhibition catalogue
search result read up on browser mobile application
email write groupware mashup (e.g., map)
text message create sector-specific application text piece
concept research CMS concept
text file assess ECMS timeline
video evaluate CRM study
map arrange enterprise software presentation
stockphoto sort graphics/layouting software fact collection
in-house database structure IP telephony description of an exhibit
calendar entry summarise etc. analysis
spreadsheet shorten etc.
archive translate
etc. catch up on
combine
abstract
integrate
visualise
generate
annotate
reference
etc.
Information
Information
Information
Information
Information
Information
Information
Information
Information
? ??
?Information
OutputInput SoftwareProcesses
Web Annotations and Language Technology – I Annotate 2016
Structure visualisation
Multilingual multimedia sources
Crossmedia recommendations
Multilingual summarisation
Event timelining
Semantification of content
Multilingual sentiment analysis
Semantic story-telling
Ontology-based knowledge structures
15
Curation Processes
platform for digital curation technologies
broker REST API
curation service 1
language or knowledge
technology
curation service 2
language or knowledge
technology
client using 

the API
external
service 1
external
service 2
client using 

the API
client using 

the API
client using 

the API
pipelined curation workflow
Web Annotations and Language Technology – I Annotate 2016 16
platform for digital curation technologies
broker REST API
curation service 1
language or knowledge
technology
curation service 2
language or knowledge
technology
client using 

the API
external
service 1
external
service 2
client using 

the API
client using 

the API
client using 

the API
pipelined curation workflow
•  Annotation of time expressions – needed for visualisation of time-lining
•  Input: text content – output: list of time expressions and mean dates
•  Storage using the Web Annotation model
•  http://dkt-projekt.github.io/webAnnotation/webannotation-dkt.html
Example
Web Annotations and Language Technology – I Annotate 2016 17
Input
Web Annotations and Language Technology – I Annotate 2016 18
Output
Mean dates
Intervals
JSON-LD representation
Web Annotations for HQMT
•  Current MT research workflows use several specialised and
incompatible tools and distributed repositories.
•  Ideal scenario: one coherent, 

interoperable and integrated 

ecosystem of tools.
•  Centrally stored web 

annotations would be 

a massive step in the 

right direction!
Web Annotations and Language Technology – I Annotate 2016 20
http://www.cracking-the-language-barrier.eu/mt-eval-workshop-2016/
- Ranking
- Post-Editing
- Error Annotation (MQM)
- Task based Evaluation
Human Evaluation
- Sampling
- Filtering
- Translation Memory Inclusion
- Terminology Checking
Translation Production Workflows
- Tokeinisation
- POS tagging
- Parsing
- Entity recognition
- WSD
Linguistic Analysis
- Services
- Development
Machine Translation
- BLEU
- Quality Estimation
- PE-Distance
- Test-Suites
Automatic Evaluation
REPOSITORY
COCKPIT
BACKEND
DATA SETS
META-SHARE
WMT
JRC
CLARIN
Multidimensional Quality Metrics
MQM for MT diagnostics
•  Customisable framework for translation quality metrics
•  Early version standardised in W3C’s ITS 2.0
21
•  Annotations in current workflows are typically
proprietary, tool-, format- and workflow-based.
•  Web annotations could enable the creation of a
collaborative corpus of translation data for the
whole community.
•  Feedback into MT engines through annotated
web-scale corpora could lead to a boost in
performance and quality.
•  Next slide: conversion of proprietary tool format
to Web Annotations.
From MQM to Web Annotations
Web Annotation
(intermediate XML syntax)
Proprietary and tool-specific CSV
MQM issue type
https://github.com/dkt-projekt/webAnnotation/tree/gh-pages/mqm-webannotation
Web Annotation Infrastructure
•  Web annotations themselves work on language.
•  Language Technology could help build better services.
•  Anchoring annotations to changing content in a
robust way is apparently tricky.
•  Semantic methods for identifying the new position of the
original anchors that have changed since the annotation
was put there.
•  Annotating all copies of the document that is
currently being annotated – application of methods for
duplicate detection or near duplicate detection.
Web Annotations and Language Technology – I Annotate 2016 23
Vision 2020
•  Next generation personal assistant.
•  Highly personalised, assisted browsing experience.
•  Semantic language technologies in the background.
•  Detection of the user‘s tasks, intentions, preferences.
•  Annotation of relevant, surprising, new facts in current
and future content through web annotations.
•  Anticipation of the user’s next steps.
•  Suggestion of related content based on 

user modelling and semantic story telling.
Web Annotations and Language Technology – I Annotate 2016 24
Georg Rehm and Hans Uszkoreit (eds.). The META-NET Strategic Research Agenda for Multilingual
Europe 2020. Springer, 2013; see Priority Research Theme “Socially-Aware Interactive Assistant”.
So, are Web Annotations a game changer
for Language Technology?
Yes, most certainly – if the UX and
browser support are done right.
Maybe Language Technology can also be
a game changer for Web Annotations.
Web Annotations and Language Technology – I Annotate 2016 25
Thank you!
Web Annotations and Language Technology – I Annotate 2016 26
supported by supported by
Beyond Multilingual Europe
04/05 July, 2016 – Lisbon, Portugal
http://www.meta-forum.eu
Deadline for submissions: 29 May 2016

More Related Content

Similar to Web Annotations – A Game Changer for Language Technology?

Resume - Jay_Rawal
Resume - Jay_RawalResume - Jay_Rawal
Resume - Jay_Rawal
Jay Rawal
 
Semantic web technology
Semantic web technologySemantic web technology
Semantic web technology
Stanley Wang
 
Nikhil Bagde Software Engineer
Nikhil Bagde Software EngineerNikhil Bagde Software Engineer
Nikhil Bagde Software Engineer
Nikhil Bagde
 
Web engineering notes unit 2
Web engineering notes unit 2Web engineering notes unit 2
Web engineering notes unit 2
inshu1890
 
Semanacco
SemanaccoSemanacco
Semanacco
STIinnsbruck
 
sample PPT.pptx
sample PPT.pptxsample PPT.pptx
sample PPT.pptx
ManishDubey91569
 
OlindaTurner_Resume_ContentPM
OlindaTurner_Resume_ContentPMOlindaTurner_Resume_ContentPM
OlindaTurner_Resume_ContentPM
Olinda Turner
 
Modelling and exchanging annotations
Modelling and exchanging annotationsModelling and exchanging annotations
Modelling and exchanging annotations
Antoine Isaac
 
Language Resources for Multilingual Europe
Language Resources for Multilingual EuropeLanguage Resources for Multilingual Europe
Language Resources for Multilingual Europe
Georg Rehm
 
Web technologies course, an introduction
Web technologies course, an introductionWeb technologies course, an introduction
Web technologies course, an introduction
Piero Fraternali
 
Cs8092 computer graphics and multimedia unit 5
Cs8092 computer graphics and multimedia unit 5Cs8092 computer graphics and multimedia unit 5
Cs8092 computer graphics and multimedia unit 5
SIMONTHOMAS S
 
An Introduction to Semantic Web Technology
An Introduction to Semantic Web TechnologyAn Introduction to Semantic Web Technology
An Introduction to Semantic Web Technology
Ankur Biswas
 
Building for the Future The Impact of Full Stack Development on Modern Applic...
Building for the Future The Impact of Full Stack Development on Modern Applic...Building for the Future The Impact of Full Stack Development on Modern Applic...
Building for the Future The Impact of Full Stack Development on Modern Applic...
Amplework Software Pvt. Ltd.
 
ITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit AnalyticsITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit Analytics
Andrew Clark
 
Wipro web3.0 seminar-brochure
Wipro web3.0 seminar-brochureWipro web3.0 seminar-brochure
Wipro web3.0 seminar-brochure
Nagaraju Pappu
 
Learn web development: Front-end vs Back-end development
Learn web development: Front-end vs Back-end developmentLearn web development: Front-end vs Back-end development
Learn web development: Front-end vs Back-end development
puneetbatra24
 
Bhashini (NLTM) Tools
Bhashini (NLTM) ToolsBhashini (NLTM) Tools
Bhashini (NLTM) Tools
Aravinth Bheemaraj
 
No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...
No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...
No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...
ACTUONDA
 
LocServ - presentation of great localization and internationalization services
LocServ - presentation of great localization and internationalization servicesLocServ - presentation of great localization and internationalization services
LocServ - presentation of great localization and internationalization services
LocServ
 
ResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRMResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRM
Vladimir Alexiev, PhD, PMP
 

Similar to Web Annotations – A Game Changer for Language Technology? (20)

Resume - Jay_Rawal
Resume - Jay_RawalResume - Jay_Rawal
Resume - Jay_Rawal
 
Semantic web technology
Semantic web technologySemantic web technology
Semantic web technology
 
Nikhil Bagde Software Engineer
Nikhil Bagde Software EngineerNikhil Bagde Software Engineer
Nikhil Bagde Software Engineer
 
Web engineering notes unit 2
Web engineering notes unit 2Web engineering notes unit 2
Web engineering notes unit 2
 
Semanacco
SemanaccoSemanacco
Semanacco
 
sample PPT.pptx
sample PPT.pptxsample PPT.pptx
sample PPT.pptx
 
OlindaTurner_Resume_ContentPM
OlindaTurner_Resume_ContentPMOlindaTurner_Resume_ContentPM
OlindaTurner_Resume_ContentPM
 
Modelling and exchanging annotations
Modelling and exchanging annotationsModelling and exchanging annotations
Modelling and exchanging annotations
 
Language Resources for Multilingual Europe
Language Resources for Multilingual EuropeLanguage Resources for Multilingual Europe
Language Resources for Multilingual Europe
 
Web technologies course, an introduction
Web technologies course, an introductionWeb technologies course, an introduction
Web technologies course, an introduction
 
Cs8092 computer graphics and multimedia unit 5
Cs8092 computer graphics and multimedia unit 5Cs8092 computer graphics and multimedia unit 5
Cs8092 computer graphics and multimedia unit 5
 
An Introduction to Semantic Web Technology
An Introduction to Semantic Web TechnologyAn Introduction to Semantic Web Technology
An Introduction to Semantic Web Technology
 
Building for the Future The Impact of Full Stack Development on Modern Applic...
Building for the Future The Impact of Full Stack Development on Modern Applic...Building for the Future The Impact of Full Stack Development on Modern Applic...
Building for the Future The Impact of Full Stack Development on Modern Applic...
 
ITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit AnalyticsITAC 2016 Where Open Source Meets Audit Analytics
ITAC 2016 Where Open Source Meets Audit Analytics
 
Wipro web3.0 seminar-brochure
Wipro web3.0 seminar-brochureWipro web3.0 seminar-brochure
Wipro web3.0 seminar-brochure
 
Learn web development: Front-end vs Back-end development
Learn web development: Front-end vs Back-end developmentLearn web development: Front-end vs Back-end development
Learn web development: Front-end vs Back-end development
 
Bhashini (NLTM) Tools
Bhashini (NLTM) ToolsBhashini (NLTM) Tools
Bhashini (NLTM) Tools
 
No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...
No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...
No more BITS - Blind Insignificant Technologies ands Systems by Roger Roberts...
 
LocServ - presentation of great localization and internationalization services
LocServ - presentation of great localization and internationalization servicesLocServ - presentation of great localization and internationalization services
LocServ - presentation of great localization and internationalization services
 
ResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRMResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRM
 

More from Georg Rehm

QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
Georg Rehm
 
Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...
Georg Rehm
 
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
Georg Rehm
 
AI and Conference Interpretation – From Smart Assistants for the Human Interp...
AI and Conference Interpretation – From Smart Assistants for the Human Interp...AI and Conference Interpretation – From Smart Assistants for the Human Interp...
AI and Conference Interpretation – From Smart Assistants for the Human Interp...
Georg Rehm
 
Künstliche Intelligenz beim Dolmetschen und Übersetzen
Künstliche Intelligenz beim Dolmetschen und ÜbersetzenKünstliche Intelligenz beim Dolmetschen und Übersetzen
Künstliche Intelligenz beim Dolmetschen und Übersetzen
Georg Rehm
 
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Georg Rehm
 
European Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and FutureEuropean Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and Future
Georg Rehm
 
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and InterpretationTowards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
Georg Rehm
 
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) ÜberblickKI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
Georg Rehm
 
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Georg Rehm
 
AI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual EuropeAI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual Europe
Georg Rehm
 
Kuratieren im Zeitalter der KI
Kuratieren im Zeitalter der KIKuratieren im Zeitalter der KI
Kuratieren im Zeitalter der KI
Georg Rehm
 
Artificial Intelligence for the Film Industry
Artificial Intelligence for the Film IndustryArtificial Intelligence for the Film Industry
Artificial Intelligence for the Film Industry
Georg Rehm
 
KI für die Kundenkommunikation
KI für die KundenkommunikationKI für die Kundenkommunikation
KI für die Kundenkommunikation
Georg Rehm
 
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Georg Rehm
 
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen BibliothekenDigitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Georg Rehm
 
EPUB, quo vadis? Publishing im W3C
EPUB, quo vadis? Publishing im W3CEPUB, quo vadis? Publishing im W3C
EPUB, quo vadis? Publishing im W3C
Georg Rehm
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual Europe
Georg Rehm
 
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Georg Rehm
 
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Georg Rehm
 

More from Georg Rehm (20)

QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
QURATOR: A Flexible AI Platform for the Adaptive Analysis and Creative Genera...
 
Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...
 
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
The Preparation, Impact and Future of the META-NET White Paper Series “Europe...
 
AI and Conference Interpretation – From Smart Assistants for the Human Interp...
AI and Conference Interpretation – From Smart Assistants for the Human Interp...AI and Conference Interpretation – From Smart Assistants for the Human Interp...
AI and Conference Interpretation – From Smart Assistants for the Human Interp...
 
Künstliche Intelligenz beim Dolmetschen und Übersetzen
Künstliche Intelligenz beim Dolmetschen und ÜbersetzenKünstliche Intelligenz beim Dolmetschen und Übersetzen
Künstliche Intelligenz beim Dolmetschen und Übersetzen
 
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
Herausforderungen und Lösungen für die europäische Sprachtechnologie- Forschu...
 
European Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and FutureEuropean Language Technologies – Past, Present and Future
European Language Technologies – Past, Present and Future
 
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and InterpretationTowards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
 
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) ÜberblickKI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
KI, Sprachtechnologie und Digital Humanities: Ein (unvollständiger) Überblick
 
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...Language Technologies for Multilingual Europe - Towards a Human Language Proj...
Language Technologies for Multilingual Europe - Towards a Human Language Proj...
 
AI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual EuropeAI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual Europe
 
Kuratieren im Zeitalter der KI
Kuratieren im Zeitalter der KIKuratieren im Zeitalter der KI
Kuratieren im Zeitalter der KI
 
Artificial Intelligence for the Film Industry
Artificial Intelligence for the Film IndustryArtificial Intelligence for the Film Industry
Artificial Intelligence for the Film Industry
 
KI für die Kundenkommunikation
KI für die KundenkommunikationKI für die Kundenkommunikation
KI für die Kundenkommunikation
 
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
Transformieren, Manipulieren, Kuratieren: Technologien für die Wissensarbeit ...
 
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen BibliothekenDigitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
 
EPUB, quo vadis? Publishing im W3C
EPUB, quo vadis? Publishing im W3CEPUB, quo vadis? Publishing im W3C
EPUB, quo vadis? Publishing im W3C
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual Europe
 
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
 
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
 

Recently uploaded

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 

Recently uploaded (20)

20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 

Web Annotations – A Game Changer for Language Technology?

  • 1. Georg Rehm, Felix Sasaki, Aljoscha Burchardt DFKI GmbH – Language Technology Lab, Berlin Web Annotations A Game Changer for Language Technologies?
  • 2. Language Technology •  Language Technology is a heterogeneous and evolving set of applications that involve the –  (semi-)automatic processing (analysis) or –  (semi-)automatic production of human language (written or spoken). •  Driven by NLP, CL, Linguistics, CompSci, CogSci, AI. •  Methods operate on language data (often web-scale) •  Rule-based tools, statistics (machine learning) •  Need for human experts to analyse and annotate data sets with highly specialised linguistic analysis information Web Annotations and Language Technology – I Annotate 2016 2
  • 3. Selected LT Applications Spell checking, grammar checking Search engines (IR) Interactive personal assistants (Cortana, Siri etc.) Machine Translation Recommender systems Social media (analytics, streams) Knowledge-based systems Web Annotations and Language Technology – I Annotate 2016 3
  • 4. Web Annotations and Language Technology – I Annotate 2016 Web Annotation Architecture Web annotation architecture http://www.w3.org/annotation What is the relationship between Web Annotations and Language Technology? 4
  • 5. Web Annotations and Language Technology – I Annotate 2016 Web Annotation Architecture Content could be created by Language Technology fully automatically or in a semi-automatic way (text generation). 5
  • 6. Web Annotations and Language Technology – I Annotate 2016 Web Annotation Architecture Content could be analysed by Language Technology (semantic analysis, input for ML algorithms etc.) 6
  • 7. Web Annotations and Language Technology – I Annotate 2016 Web Annotation Architecture Especially in Social Media Analytics we are very interested in UGC, i.e., in comments, feedback – “what do users think of a certain product?“ etc. 7
  • 8. Web Annotations and Language Technology – I Annotate 2016 Web Annotation Architecture •  Today, analysing UGC is difficult and costly (many heterogeneous sources, many different formats). •  A few established and widely used Web Annotation services would simplify SMA dramatically! 8
  • 9. Web Annotations and Language Technology – I Annotate 2016 Web Annotation Architecture We can also use Language Technology methods to create (or help create) annotations, for example, in a smart authoring scenario. 9
  • 10. LT and Web Annotations •  Analysis of web annotations and making use of web annotations through Language Technology: –  Arbitrary web annotations (i.e., unstructured text) •  No more crawling, aggregating, mapping! –  Dedicated LT-specific web annotations •  Annotating language data without any specialised stand-alone tools or data repositories! •  Generation of web annotations through Language Technology (e.g., to provide background information on important content – see, e.g., the Pundit use cases). Web Annotations and Language Technology – I Annotate 2016 10
  • 11. Example Scenarios •  Two example scenarios to demonstrate how Language Technology and Web Annotations go together. •  Scenario 1 – Digital Curation Technologies:
 Semantification of content for curators of digital information •  Scenario 2 – Machine Translation:
 Web Annotations for High-Quality Machine Translation Web Annotations and Language Technology – I Annotate 2016 11
  • 12. language and knowledge technologies curation technologies sector-specific technologies platformtechnologies sector-specific solutions ! Digital Curation Technologies •  Support curation processes through sophisticated language and knowledge technologies. •  Goal: transfer of these technologies into industry through platform for digital curation technologies. Web Annotations and Language Technology – I Annotate 2016 12
  • 13. Information Information Information Information Information Information Information Information Information ? ?? ?Information OutputInput SoftwareProcesses Web Annotations and Language Technology – I Annotate 2016 13 •  Investigative journalist •  Curator of an exhibition •  TV editor •  Author •  Scholar •  Knowledge worker •  Curator of digital information
  • 14. Sectors Input Processes Software Output tweet analyse text processor newspaper article newspaper article select presentation multimedia website wire copy focus spreadsheet tv report facebook status update revise email exhibition catalogue search result read up on browser mobile application email write groupware mashup (e.g., map) text message create sector-specific application text piece concept research CMS concept text file assess ECMS timeline video evaluate CRM study map arrange enterprise software presentation stockphoto sort graphics/layouting software fact collection in-house database structure IP telephony description of an exhibit calendar entry summarise etc. analysis spreadsheet shorten etc. archive translate etc. catch up on combine abstract integrate visualise generate annotate reference etc. Information Information Information Information Information Information Information Information Information ? ?? ?Information OutputInput SoftwareProcesses
  • 15. Web Annotations and Language Technology – I Annotate 2016 Structure visualisation Multilingual multimedia sources Crossmedia recommendations Multilingual summarisation Event timelining Semantification of content Multilingual sentiment analysis Semantic story-telling Ontology-based knowledge structures 15 Curation Processes
  • 16. platform for digital curation technologies broker REST API curation service 1 language or knowledge technology curation service 2 language or knowledge technology client using 
 the API external service 1 external service 2 client using 
 the API client using 
 the API client using 
 the API pipelined curation workflow Web Annotations and Language Technology – I Annotate 2016 16
  • 17. platform for digital curation technologies broker REST API curation service 1 language or knowledge technology curation service 2 language or knowledge technology client using 
 the API external service 1 external service 2 client using 
 the API client using 
 the API client using 
 the API pipelined curation workflow •  Annotation of time expressions – needed for visualisation of time-lining •  Input: text content – output: list of time expressions and mean dates •  Storage using the Web Annotation model •  http://dkt-projekt.github.io/webAnnotation/webannotation-dkt.html Example Web Annotations and Language Technology – I Annotate 2016 17
  • 18. Input Web Annotations and Language Technology – I Annotate 2016 18
  • 20. Web Annotations for HQMT •  Current MT research workflows use several specialised and incompatible tools and distributed repositories. •  Ideal scenario: one coherent, 
 interoperable and integrated 
 ecosystem of tools. •  Centrally stored web 
 annotations would be 
 a massive step in the 
 right direction! Web Annotations and Language Technology – I Annotate 2016 20 http://www.cracking-the-language-barrier.eu/mt-eval-workshop-2016/ - Ranking - Post-Editing - Error Annotation (MQM) - Task based Evaluation Human Evaluation - Sampling - Filtering - Translation Memory Inclusion - Terminology Checking Translation Production Workflows - Tokeinisation - POS tagging - Parsing - Entity recognition - WSD Linguistic Analysis - Services - Development Machine Translation - BLEU - Quality Estimation - PE-Distance - Test-Suites Automatic Evaluation REPOSITORY COCKPIT BACKEND DATA SETS META-SHARE WMT JRC CLARIN
  • 21. Multidimensional Quality Metrics MQM for MT diagnostics •  Customisable framework for translation quality metrics •  Early version standardised in W3C’s ITS 2.0 21 •  Annotations in current workflows are typically proprietary, tool-, format- and workflow-based. •  Web annotations could enable the creation of a collaborative corpus of translation data for the whole community. •  Feedback into MT engines through annotated web-scale corpora could lead to a boost in performance and quality. •  Next slide: conversion of proprietary tool format to Web Annotations.
  • 22. From MQM to Web Annotations Web Annotation (intermediate XML syntax) Proprietary and tool-specific CSV MQM issue type https://github.com/dkt-projekt/webAnnotation/tree/gh-pages/mqm-webannotation
  • 23. Web Annotation Infrastructure •  Web annotations themselves work on language. •  Language Technology could help build better services. •  Anchoring annotations to changing content in a robust way is apparently tricky. •  Semantic methods for identifying the new position of the original anchors that have changed since the annotation was put there. •  Annotating all copies of the document that is currently being annotated – application of methods for duplicate detection or near duplicate detection. Web Annotations and Language Technology – I Annotate 2016 23
  • 24. Vision 2020 •  Next generation personal assistant. •  Highly personalised, assisted browsing experience. •  Semantic language technologies in the background. •  Detection of the user‘s tasks, intentions, preferences. •  Annotation of relevant, surprising, new facts in current and future content through web annotations. •  Anticipation of the user’s next steps. •  Suggestion of related content based on 
 user modelling and semantic story telling. Web Annotations and Language Technology – I Annotate 2016 24 Georg Rehm and Hans Uszkoreit (eds.). The META-NET Strategic Research Agenda for Multilingual Europe 2020. Springer, 2013; see Priority Research Theme “Socially-Aware Interactive Assistant”.
  • 25. So, are Web Annotations a game changer for Language Technology? Yes, most certainly – if the UX and browser support are done right. Maybe Language Technology can also be a game changer for Web Annotations. Web Annotations and Language Technology – I Annotate 2016 25
  • 26. Thank you! Web Annotations and Language Technology – I Annotate 2016 26 supported by supported by Beyond Multilingual Europe 04/05 July, 2016 – Lisbon, Portugal http://www.meta-forum.eu Deadline for submissions: 29 May 2016