SlideShare a Scribd company logo
1 of 53
Download to read offline
Towards Universal
Semantic Understanding
of Natural Languages
Yunyao Li (@yunyao_li)
Senior Research Manager
Scalable Knowledge Intelligence
IBM Research - Almaden
How many
languages
are there in the world?
2
3
7,102
known languages
23
most spoken language
4.1+ Billion
people
Source: https://www.iflscience.com/environment/worlds-most-spoken-languages-and-where-they-are-spoken/
4
Asia-Pacific Region
> 3,200 Languages
28 Major language families
Source: https://reliefweb.int/sites/reliefweb.int/files/resources/OCHA_ROAP_Language_v6_110519.pdf
Conventional Approach
towards Language
Enablement
5
English Text English NLU English Applications
German Text German NLU German Applications
Chinese Text Chinese NLU Chinese Applications
Separate NLU pipeline
for each language
Separate application
for each language
Universal Semantic
Understanding of Natural
Languages
6
English Text
German Text Universal NLU Cross-lingual Applications
Chinese Text
Single NLU pipeline for
different languages
Develop once for
different language
The Challenges
7
Models
– Low-frequency exceptions
– Built for one task at a time
Training Data
– High quality labeled data is
required but hard to obtain
Meaning Representation
– Different meaning
representation
• for different languages
• for the same languages
- Data: Auto-generation + crowd-
in-the-loop [ACL’15, EMNLP’16, EMNLP’17,
EMNLP’20 Findings]
- Training: Cross-Lingual transfer
[EMNLP’20 Findings]
Unified Meaning Representation
[ACL’15, ACL’16, ACL-DMR’19]
– Instance-based learning
[COLING’16]
– Deep learning + instance-based
learning [In Submission]
– Human-machine co-creation
[ACL’19, EMNLP’20]
Our Research
7
The Challenges
8
Models
– Low-frequency exceptions
– Built for one task at a time
Training Data
– High quality labeled data is
required but hard to obtain
Meaning Representation
– Different meaning
representation
• for different languages
• for the same languages
- Data: Auto-generation + crowd-
in-the-loop [ACL’15, EMNLP’16, EMNLP’17,
EMNLP’20 Findings]
- Training: Cross-Lingual transfer
[EMNLP’20 Findings]
Unified Meaning Representation
[ACL’15, ACL’16, ACL-DMR’19]
– Instance-based learning
[COLING’16]
– Deep learning + instance-based
learning [In Submission]
– Human-machine co-creation
[ACL’19, EMNLP’20]
Our Research
8
John hastily ordered a dozen dandelions for Mary from Amazon’s Flower Shop.
order.02 (request to be delivered)
A0: Orderer
A1: Thing ordered
A2: Benefactive, ordered-for
A3: Source
A0: Orderer
A1: Thing ordered
A2: Benefactive, ordered-for
A3: SourceAM-MNR: Manner
WHO
HOW
DID
WHAT WHERE
Semantic Role Labeling (SRL)
FOR
WHOM
Who did what to whom, when, where and how?
Dirk broke the window with a hammer.
Break.01A0 A1 A2
The window was broken by Dirk.
The window broke.
A1 Break.01 A0
A1 Break.01
Break.01
A0 – Breaker
A1 – Thing broken
A2 – Instrument
A3 – Pieces
Break.15
A0 – Journalist,
exposer
A1 – Story,
thing exposed
Syntax vs. Semantic Parsing
What type of labels are valid across languages?
• Lexical, morphological and syntactic labels differ greatly
• Shallow semantic labels remain stable
SRL Resources
Other languages
• Chinese Proposition Bank
• Hindi Proposition Bank
• German FrameNet
• French? Spanish? Russian? Arabic? …
English
• FrameNet
• PropBank
1. Limited coverage
2. Language-specific formalisms
订购
A0: buyer
A1: commodity
A2: seller
order.02
A0: orderer
A1: thing ordered
A2: benefactive, ordered-for
A3: source
We want different languages to share the same semantic labels
WhatsApp was bought by Facebook
Facebook hat WhatsApp gekauft
Facebook a achété WhatsApp
buy.01
Facebook WhatsApp
Buyer Thing bought
Cross-lingual representationMultilingual input text
Buy.01 A0A1
Buy.01A1A0
Buy.01A0 A1
Shared Frames Across Languages
A0 A1
The Challenges
13
Models
– Low-frequency exceptions
– Built for one task at a time
Training Data
– High quality labeled data is
required but hard to obtain
Meaning Representation
– Different meaning
representation
• for different languages
• for the same languages
- Data: Auto-generation + crowd-
in-the-loop [ACL’15, EMNLP’16, EMNLP’17,
EMNLP’20 Findings]
- Training: Cross-Lingual transfer
[EMNLP’20 Findings]
Unified Meaning Representation
[ACL’15, ACL’16, ACL-DMR’19]
– Instance-based learning
[COLING’16]
– Deep learning + instance-based
learning [In Submission]
– Human-machine co-creation
[ACL’19, EMNLP’20]
Our Research
13
Generate SRL resources for many other languages
• Shared frame set
• Minimal effort
Il faut qu‘ il y ait des responsables
Need.01A0
Je suis responsable pour le chaos
Be.01A1 A2 AM-PRD
Les services postaux ont achété des …
Be.01 A2A1
Buy.01A0
Corpus of annotated text data
Universal Proposition Banks
Frame set
Buy.01
A0 – Buyer
A1 – Thing bought
A2 – Seller
A3 – Price paid
A4 – Benefactive
Pay.01
A0 – Payer
A1 – Money
A2 – Being payed
A3 – Commodity
Annotator training
months
Annotation
Years
Repeat
for each language!
Current Practices
15
Example: TV subtitles
Our Idea: Annotation projection with parallel corpora
Das würde ich für einen Dollar kaufen German subtitles
I would buy that for a dollar! English subtitles
PRICEBUYER ITEM
BUYERITEM
Training data
• Semantically annotated
• Multilingual
• Large amount
I would buy that for a dollar
PRICE
projection
Das würde ich für einen Dollar kaufen
Auto-Generation of Universal
Preposition Bank
16
Resource: https://www.youtube.com/watch?v=u5HOt0ZOcYk
We need to hold people responsible
Il faut qu‘ il y ait des responsables
English sentence:
Target sentence:
Hold.01A0 A1 A3Need.01
Hold.01
Incorrect projection!
There need to be those responsible
A1
Main error sources:
• Translation shift
• Source-language SRL errors
However: Projections Not
Always Possible
Filtered Projection &
Bootstrapping
Two-step process
– Filters to detect translation shift, block
projections (more precision at cost of
recall)
– Bootstrap learning to increase recall
– Generated 7 universal proposition banks
from 3 language groups
• Version 1.0: https://github.com/System-
T/UniversalPropositions/
• Version 2.0 coming soon
[ACL’15] Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling.
Multilingual Aliasing
• Problem: Target language frame lexicon
automatically generated from alignments
– False frames
– Redundant frames
• Expert curation of frame mappings
[COLING’16] Multilingual Aliasing for Auto-Generating Proposition
Banks
Low-resource Languages
Apply approach to low-resource languages
Bengali, Malayalam, Tamil
– Fewer sources of parallel data
– Almost no NLP: No syntactic parsing,
lemmatization etc.
Crowdsourcing for data curation
[EMNLP’16] Towards Semi-Automatic Generation of Proposition Banks for Low-
Resource Languages
Annotation
Tasks (all)
Task
Routerraw text
Corpus
predicted
annotations
Corpus
curated
annotations
Corpus
Easy tasks are curated by crowd
Difficult tasks are curated by experts
Crowd-in-the-Loop Curation
[EMNLP’17] CROWD-IN-THE-LOOP: A Hybrid Approach for Annotating Semantic Roles
Task Router Classifier
­9pp F1
improvement over SRL
results
Effectiveness of Crowd-in-
the-Loop
¯66.4pp
expert efforts
­10pp F1
improvement over SRL
results
¯87.3pp
expert efforts
Latest: Filter à Select à Expert
[Findings of EMNLP’20] A Novel Workflow for Accurately and Efficiently Crowdsourcing Predicate Senses and Argument Labels
The Challenges
24
Models
– Low-frequency exceptions
– Built for one task at a time
Training Data
– High quality labeled data is
required but hard to obtain
Meaning Representation
– Different meaning
representation
• for different languages
• for the same languages
- Data: Auto-generation + crowd-
in-the-loop [ACL’15, EMNLP’16, EMNLP’17,
EMNLP’20 Findings]
- Training: Cross-Lingual transfer
[EMNLP’20 Findings]
Unified Meaning Representation
[ACL’15, ACL’16, ACL-DMR’19]
– Instance-based learning
[COLING’16]
– Deep learning + instance-based
learning [In Submission]
– Human-machine co-creation
[ACL’19, EMNLP’20]
Our Research
24
WhatsApp was bought by Facebook
Facebook hat WhatsApp gekauft
Facebook a achété WhatsApp
buy.01
Facebook WhatsApp
Buyer Thing bought
Cross-lingual representation
Multilingual input text
Buy.01 A0A1
Buy.01A1A0
Buy.01A0 A1
Cross-lingual Meaning
Representation
Cross-lingual extraction
Task: Extract who bought what
[NAACL’18] SystemT: Declarative Text Understanding for Enterprise
[ACL’16] POLYGLOT: Multilingual Semantic Role Labeling with Unified Labels
[COLING’16] Multilingual Information Extraction with PolyglotIE
Cross-lingual Transfer?
Challenge:
Low-resource languages lacks
- Large monolingual labeled data
- Parallel corpora
Solution:
Transfer knowledge and resources from rich
resource language to low resource language
EN DE YO
. . .
Multilingual or Polyglot
Training
Main Idea
• Combine training data from multiple
languages with multilingual word
embeddings
• Train a common encoder model to enable
parameter sharing.
Challenge
Different languages have different
annotations scheme
EN DE YO
. . .
Different Annotations across
Languages
Observation:
Certain argument labels do share common
semantic meaning across languages.
Intuition:
Identify and exploit the commonalities
between annotation of different languages.
Know.01
A0: Knower
A1: Thing known
A2: A1 known about
AM: Adjuncts
Knnen.01
A0: Knower
A1: Entity
AM: Adjuncts
Hypothesis
Pair Matching:
Identify arguments with similar semantic meaning
across languages and
Source
Manifold
ZH-A0
A0
AM-TMP
ZH-TMP
Target
Manifold
1
2 Argument Regularization
Represent them close to each other in the feature
space.
The Framework
Regularizer is applied at parameters of the last layer of the model.
.…
.…
.…
.…
.…
.…
.…
.…
.…
.…
.. .. ..
Softmax
.. .. .. .. .. .. .. .. .. ..
BiLSTM BiLSTM BiLSTM
BiLSTM
Encoder
Word
Representations
Fixed Sentence Representation
vi<latexit sha1_base64="PUnq/hPkljd9ESWrB39UNwUjUEE=">AAAE5nicdVPLbhMxFHXbAGV4tSCxYWMxjcQiijIpFSwrihBsUJHSh5RJI4/HSax4HvIjyWD8C+wQW8QWPoBf4W/wPJQmk3Klka7OPec+x0HKqJCdzt+t7Z3Grdt3du869+4/ePhob//xuUgUx+QMJyzhlwEShNGYnEkqGblMOUFRwMhFMD3J4xczwgVN4p7MUjKI0DimI4qRtNBw7+mBL8lCFok0J6HRsyE1B8M9t9PuFAY3Ha9yXFDZ6XB/548fJlhFJJaYISH6XieVA424pJgR4/hKkBThKRqTvnVjFBEx0EVdA5sWCeEo4faLJSzQVYVGkRBZFFhmhORE1GM5eFOsr+To9UDTOFWSxLgsNFIMygTm24Ah5QRLllkHYU5trxBPEEdY2p2tZQpnNBVV14uy7bUuFiniYn1OLVGgWohzlLUixSTlybw15iidULyoqYX1SCsaUS6kwq1cyRDPWuUuZMBqmen08zoSJMnUqoRxHP8jmb+tbnGSRBGKQ+0XeWJlb6GE0RYlenhljG46EPoBGdN4zBOVljxGpMaK83JO3TYF68PoHDFFeu+06xkLQH397/Rdb6DdrqnjSwwWlay5h00oJwRWvZTgMrd2Xxo9XE2w0oeNGdOEQgXCHiuVG9ojo6/+pz2qtKm97LW6nIzEYTG9Y7fn+DGZ4+XaGA9M3862vkEd2FvGJl8EtPPVJfObJPMJlWQpWdf0jGUHCQvzvzxhsLfByGqMLC/LySpF1ShqkzKrUWaWYt+6V3/Zm855t+0dtrufuu7xm+rV74Jn4Dl4ATzwChyD9+AUnAEMvoCf4Bf43Zg0vja+Nb6X1O2tSvMErFnjxz+GMq5d</latexit>
ui<latexit sha1_base64="SV6pSr2q35hT+K6jX2dqCFXuEGo=">AAAE6HicdVPLbtNAFJ22AYp5pbBBYjPCjcQiiuKUCpYVRQg2qEjpQ4rTaDyZJEPGD80jiRkN38AOsUVsYc2v8DdMbJPGTrmSpatzz7lPT5AwKmS7/Wdre6d24+at3dvOnbv37j+o7z08E7HimJzimMX8IkCCMBqRU0klIxcJJygMGDkPpsfL+PmMcEHjqCvThPRDNI7oiGIkLTSoP973JVnILJEOeDyPjFYDavYHdbfdamcGNx2vcFxQ2Mlgb+e3P4yxCkkkMUNC9Lx2IvsacUkxI8bxlSAJwlM0Jj3rRigkoq+zygY2LDKEo5jbL5IwQ9cVGoVCpGFgmSGSE1GNLcHrYj0lRy/7mkaJkiTCeaGRYlDGcLkPOKScYMlS6yDMqe0V4gniCEu7tVKm4Ywmouh6kbdd6mKRIC7Kc2qJAtVEnKO0GSomqV1wc8xRMqF4UVEL65FmOKJcSIWbSyVDPG3mu5ABq2Sm009lJIjjqVUJ4zj+ezJ/XdziOA5DFA21n+WJlL2FEkZblOjBpTG64UDoB2RMozGPVZLzGJEaK87zOXXLZKx3ozPEFOm+0a5nLAD11d/Tc72+djumiq8wmFWy5h40oJwQWPSSg6vc2n1u9GA9wVofNmZMAwoVCHusRG5oD42+/J/2sNAm9rJX6nwyEg2z6R27PcePyByv1sZ4YHp2tvIG/z0Wuwho56tK5tdJ5hMqyUpS1nSNZQcxGy7/8pjB7gYjrTDSZVlO1imqQlGblFmFMrMU+9a96svedM46Le+g1fnQcY9eFa9+FzwBT8Ez4IEX4Ai8BSfgFGDwGfwAP8Gv2sfal9rX2recur1VaB6BktW+/wX8W69d</latexit>
<latexit sha1_base64="sWOp4y3+vzY+IXL+chq8peFGJow=">AAAE73icdVNJbxMxFHbbACVsKRw5YDGNxCFEmZQKjhVFCC6oSOkiZdLI43ESK54FL1mwfOQ3cENcEVeQ+Cv8GzyL0sykPMnS0/e+7/kttp8wKmSn83dre6d24+at3dv1O3fv3X/Q2Ht4JmLFMTnFMYv5hY8EYTQip5JKRi4STlDoM3LuT4/T+PmMcEHjqCeXCRmEaBzREcVIWmjYeLLvSbKQWSLtM0WMfu6dCAo9TDk2+8OG02l3MoObjls4DijsZLi388cLYqxCEknMkBB9t5PIgUZcUsyIqXtKkAThKRqTvnUjFBIx0FkBBjYtEsBRzO2JJMzQdYVGoRDL0LfMEMmJqMZS8LpYX8nRq4GmUaIkiXB+0UgxKGOYjgUGlBMs2dI6CHNqa4V4gjjC0g6vlCmY0UQUVS/ysktVLBLERblPLZGvWohztGyFiknK43lrzFEyoXhRUQvrkVY4olxIhVupkiG+bOWzkD6rZKbTz2XEj+OpVQlTr3sfyPxNsYvjOAxRFGgvyxMpuwsljLYo0cNLY3SzDqHnkzGNxjxWSc5jRGqsOM/71G2Tsd6PzpB9Kr232nGNBaC+ekR9xx1op2uq+AqD2U3WnIMmlBMCi1pycJVbOy+MHq4nWKvDxoxpQqF8YZeVyA3todGX/9MeFtrEbvZKnXdGoiDrvm6nV/ciMsersTHum77trTxB7dtdRiYdBLT9VSXz6yTzCZVkJSlresay/ZgF6SuPGextMJYVxjK9lpN1iqpQ1CZlVqHMLMX+dbf6szeds27bPWh3P3ado9fFr98Fj8FT8Ay44CU4Au/ACTgFGHwBP8Ev8Lv2qfa19q32PadubxWaR6BktR//AI/psbg=</latexit>
+b<latexit sha1_base64="xTelgJN5oYRxTa8991wB9lSAa5w=">AAAE53icdVPLbhMxFHXbACW8WliwYGMxjYREFGVSKlhWFCHYoCKlDymTRrbjJFY8D/mRZLD8DewQW8QW9vwKf4MzM0ozk2LJ0tW559ynjRPOpGq3/25t79Ru3b6ze7d+7/6Dh4/29h+fy1gLQs9IzGNxiZGknEX0TDHF6WUiKAoxpxd4erL0X8yokCyOuipNaD9E44iNGEHKQYO9pweBoguVBTKYa2rNS4jtwWDPa7fa2YGbhl8YHijO6WB/508wjIkOaaQIR1L2/Hai+gYJxQinth5oSRNEpmhMe86MUEhl32SJLWw4ZAhHsXA3UjBD1xUGhVKmIXbMEKmJrPqW4E2+nlajN33DokQrGpE80UhzqGK4HAccMkGJ4qkzEBHM1QrJBAlElBtaKdJwxhJZVL3Iyy5VsUiQkOU+jUJYN5EQKG2Gmism4nlzLFAyYWRRUUtn0WY4YkIqTZpLJUcibeazUJhXIrPplzKC43jqVNLW68EnOn9X7OIkDkMUDU2QxYm024WW1jiUmsGVtaZRhzDAdMyisYh1kvM4VYZoIfI+TctmrI+jc+SeSPe98XzrAGiuH0/P8/vG69gqvsJglskd77AB1YTCopYcXMU23itrBusB1upwPmsbUGos3bIStaE9subqf9qjQpu4zV6r885oNMy6r7vp1YOIzslqbFxg23O9lSdosNtlZJeDgK6/qmR+k2Q+YYquJGVN1zo2jvlw+cpjDrsbjLTCSJdpBV2n6ApFb1JmFcrMUdxf96s/e9M477T8w1bnc8c7flv8+l3wDDwHL4APXoNj8AGcgjNAgAU/wS/wu8ZqX2vfat9z6vZWoXkCSqf24x8WR65D</latexit>
w1<latexit sha1_base64="XqunA7ZMLIHdlA4ywRCiI90cTAs=">AAAE6HicdVPLbtNAFJ22AYp5pbBBYjPCjcQiiuKUCpYVRQg2qEjpQ4rTaDyZJEPGD80jiRkN38AOsUVsYc2v8DdMbJPGTrmSpatzz7lPT5AwKmS7/Wdre6d24+at3dvOnbv37j+o7z08E7HimJzimMX8IkCCMBqRU0klIxcJJygMGDkPpsfL+PmMcEHjqCvThPRDNI7oiGIkLTSoP973JVnILJEOeDyPjJ4PPLM/qLvtVjszuOl4heOCwk4Gezu//WGMVUgiiRkSoue1E9nXiEuKGTGOrwRJEJ6iMelZN0IhEX2dVTawYZEhHMXcfpGEGbqu0CgUIg0DywyRnIhqbAleF+spOXrZ1zRKlCQRzguNFIMyhst9wCHlBEuWWgdhTm2vEE8QR1jarZUyDWc0EUXXi7ztUheLBHFRnlNLFKgm4hylzVAxSe2Cm2OOkgnFi4paWI80wxHlQircXCoZ4mkz34UMWCUznX4qI0EcT61KGMfx35P56+IWx3EYomio/SxPpOwtlDDaokQPLo3RDQdCPyBjGo15rJKcx4jUWHGez6lbJmO9G50hpkj3jXY9YwGor/6enuv1tdsxVXyFwaySNfegAeWEwKKXHFzl1u5zowfrCdb6sDFjGlCoQNhjJXJDe2j05f+0h4U2sZe9UueTkWiYTe/Y7Tl+ROZ4tTbGA9Ozs5U3+O+x2EVAO19VMr9OMp9QSVaSsqZrLDuI2XD5l8cMdjcYaYWRLstysk5RFYrapMwqlJml2LfuVV/2pnPWaXkHrc6Hjnv0qnj1u+AJeAqeAQ+8AEfgLTgBpwCDz+AH+Al+1T7WvtS+1r7l1O2tQvMIlKz2/S8NYa8n</latexit>
w2<latexit sha1_base64="m8LhNnvHy6BEbsZ2bRCfBKMuMVc=">AAAE6HicdVPLbtNAFJ22AYp5pbBBYjPCjcQiiuKUCpYVRQg2qEjpQ4rTaDyZJEPGD80jiRkN38AOsUVsYc2v8DdMbJPGTrmSpatzz7lPT5AwKmS7/Wdre6d24+at3dvOnbv37j+o7z08E7HimJzimMX8IkCCMBqRU0klIxcJJygMGDkPpsfL+PmMcEHjqCvThPRDNI7oiGIkLTSoP973JVnILJEOeDyPjJ4POmZ/UHfbrXZmcNPxCscFhZ0M9nZ++8MYq5BEEjMkRM9rJ7KvEZcUM2IcXwmSIDxFY9KzboRCIvo6q2xgwyJDOIq5/SIJM3RdoVEoRBoGlhkiORHV2BK8LtZTcvSyr2mUKEkinBcaKQZlDJf7gEPKCZYstQ7CnNpeIZ4gjrC0WytlGs5oIoquF3nbpS4WCeKiPKeWKFBNxDlKm6FiktoFN8ccJROKFxW1sB5phiPKhVS4uVQyxNNmvgsZsEpmOv1URoI4nlqVMI7jvyfz18UtjuMwRNFQ+1meSNlbKGG0RYkeXBqjGw6EfkDGNBrzWCU5jxGpseI8n1O3TMZ6NzpDTJHuG+16xgJQX/09Pdfra7djqvgKg1kla+5BA8oJgUUvObjKrd3nRg/WE6z1YWPGNKBQgbDHSuSG9tDoy/9pDwttYi97pc4nI9Ewm96x23P8iMzxam2MB6ZnZytv8N9jsYuAdr6qZH6dZD6hkqwkZU3XWHYQs+HyL48Z7G4w0gojXZblZJ2iKhS1SZlVKDNLsW/dq77sTees0/IOWp0PHffoVfHqd8ET8BQ8Ax54AY7AW3ACTgEGn8EP8BP8qn2sfal9rX3LqdtbheYRKFnt+18Rzq8o</latexit>
wn<latexit sha1_base64="hURgb79n4ej3kCCEE9z8zawwLMw=">AAAE6HicdVPLbtNAFJ22AYp5pbBBYjPCjcQiiuKUCpYVRQg2qEjpQ0rSaDyZJEPGY2seScxo+AZ2iC1iC2t+hb9hYps0ccqVLF2de859esKEUamazT9b2zuVGzdv7d727ty9d/9Bde/hmYy1wOQUxywWFyGShFFOThVVjFwkgqAoZOQ8nBwv4udTIiSNeVulCelFaMTpkGKkHNSvPt7vKjJXWSITinjGrZn1ud3vV/1mo5kZ3HSCwvFBYSf9vZ3f3UGMdUS4wgxJ2QmaieoZJBTFjFivqyVJEJ6gEek4l6OIyJ7JKltYc8gADmPhPq5ghq4qDIqkTKPQMSOkxrIcW4DXxTpaDV/2DOWJVoTjvNBQM6hiuNgHHFBBsGKpcxAW1PUK8RgJhJXb2lqmwZQmsuh6nre91sU8QUKuz2kUCnUdCYHSeqSZom7B9ZFAyZjieUktnUfq0ZAKqTSuL5QMibSe70KFrJSZTj6tI2EcT5xKWs/rviez18UtjuMoQnxgulkert0ttLTGocT0L601NQ/CbkhGlI9ErJOcx4gyWAuRz2kaNmO9G54hpkn7jfED6wBorv6ejh/0jN+yZXyJwaySM/+gBtWYwKKXHFzmNv5za/qrCVb6cDFra1DqULpjJWpDe2jN5f+0h4U2cZe9UueTET7Ipvfc9rwuJzO8XBsToe242dY3+O+xuEVAN19ZMrtOMhtTRZaSdU3bOnYYs8HiL48ZbG8w0hIjXZQVZJWiSxS9SZmWKFNHcW89KL/sTees1QgOGq0PLf/oVfHqd8ET8BQ8AwF4AY7AW3ACTgEGn8EP8BP8qnysfKl8rXzLqdtbheYRWLPK978baa9k</latexit>
CLAR Performance
Dataset: CoNLL2009
Our is SoTA
- Average performance over all languages
- 3 out of 5 non-English languages- General approach:
- Independent of base model.
- Independent of language.
- Require no parallel data.
The Challenges
32
Models
– Low-frequency exceptions
– Built for one task at a time
Training Data
– High quality labeled data is
required but hard to obtain
Meaning Representation
– Different meaning
representation
• for different languages
• for the same languages
- Data: Auto-generation + crowd-
in-the-loop [ACL’15, EMNLP’16, EMNLP’17,
EMNLP’20 Findings]
- Training: Cross-Lingual transfer
[EMNLP’20 Findings]
Unified Meaning Representation
[ACL’15, ACL’16, ACL-DMR’19]
– Instance-based learning
[COLING’16]
– Deep learning + instance-based
learning [In Submission]
– Human-machine co-creation
[ACL’19, EMNLP’20]
Our Research
32
Dependency Parsing Vs. SRL
75 80 85 90 95 100
WSJ
BROWN
SRL Depeendency Parsing
What Makes SRL So Difficult?
Heavy-tailed distribution of class labels
– Common frames
• say.01 (8243), have.01 (2040), sell.01 (1009)
– Many uncommon frames
• swindle.01, feed.01, hum.01, toast.01
– Almost half of all frames seen fewer than 3
times in training data
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
Distribution of frame labels
Many low-frequency exceptions à Difficult to capture in models
Low-Frequency Exceptions
Strong correlation of syntactic function of an argument to its role
Example: passive subject
The window was broken by Dirk
SBJ
PMOD
VC NMOD
A1
The silver was sold by the man.
SBJ
PMOD
VC NMOD
A1
Creditors were told to hold off.
SBJ
ORPD
VC
IM PRT
TELL.01
A0: speaker (agent)
A1: utterance (topic)
A2: hearer (recipient)
86% of passive
subjects are
labeled A1
(over 4.000x in
training data)
Local Bias 87% of passive
subjects of
Tell.01 are
labeled A2 (53x
in training data)
Most Classifiers
– Bag-of-features
– Learn weights for features to classes
– Perform generalization
Question: How do we explicitly
capture low-frequency exceptions?
Instance-based Learning kNN: k-Nearest Neighbors classification
Find the k most similar instances in training data
Derive class label from nearest neighbors
A0
A1
A1
A2
A1
A1
A1
A1
A1
A0
A0
A1
A0
A2
A2
A2
A2
A1
A2
?
1 2 3 ndistance
Creditors were told to hold off.
SBJ
ORPD
VC
IM PRT
“creditor” passive subject of TELL.01
noun passive subject of TELL.01
COMPOSITE FEATURE DISTANCE
1
2
.
.
.
.
.
.
any passive subject of any agentive verb n
?
Main idea: Back off to composite feature seen at least k times
[COLING 2016] K-SRL: Instance-based Learning for Semantic Role Labeling
Results
In-domain Out-of-domain
• Significantly outperform previous approaches
– Especially on out-of-domain data
• Small neighborhoods suffice (k=3)
• Fast runtime ­1.4pp F1
In-Domain
­5.1pp F1
Out-of-Domain
Latest results (improvement over SoAT.
with DL + IL, in submission)
[In Submission] Deep learning + Instance-based Learning
[COLING 2016] K-SRL: Instance-based Learning for Semantic Role Labeling
The Challenges
39
Models
– Low-frequency exceptions
– Built for one task at a time
Training Data
– High quality labeled data is
required but hard to obtain
Meaning Representation
– Different meaning
representation
• for different languages
• for the same languages
- Data: Auto-generation + crowd-
in-the-loop [ACL’15, EMNLP’16, EMNLP’17,
EMNLP’20 Findings]
- Training: Cross-Lingual transfer
[EMNLP’20 Findings]
Unified Meaning Representation
[ACL’15, ACL’16, ACL-DMR’19]
– Instance-based learning
[COLING’16]
– Deep learning + instance-based
learning [In Submission]
– Human-machine co-creation
[ACL’19, EMNLP’20]
Our Research
39
WhatsApp was bought by Facebook
Facebook hat WhatsApp gekauft
Facebook a achété WhatsApp
buy.01
Facebook WhatsApp
Buyer Thing bought
Cross-lingual representation
Multilingual input text
Buy.01 A0A1
Buy.01A1A0
Buy.01A0 A1
Crosslingual Information
Extraction
Sentence Verb Buyer Thing bought
1 buy.01 Facebook WhatsApp
2 buy.01 Facebook WhatsApp
3 buy.01 Facebook WhatsApp
Crosslingual extraction
Task: Extract who bought what
[NAACL’18] SystemT: Declarative Text Understanding for Enterprise
[ACL’16] POLYGLOT: Multilingual Semantic Role Labeling with Unified Labels
[COLING’16] Multilingual Information Extraction with PolyglotIE https://vimeo.com/180382223
Transparent Linguistic Models for Contract Understanding
41
[NAACL-NLLP’19] Transparent Linguistic Models for Contract Understanding and Comparison https://www.ibm.com/cloud/compare-and-comply
Transparent Model Design
Purchaser will
purchase the Assets
by a cash payment.
Element
Obligation for
Purchaser
[NAACL-NLLP’19] Transparent Linguistic Models for Contract Understanding and Comparison https://www.ibm.com/cloud/compare-and-comply
Transparent Model Design
Purchaser will
purchase the Assets
by a cash payment.
Element
[Purchaser]A0
[will]TENSE-FUTURE
purchase
[the Assets]A1
[by a cash payment]ARGM-MNR
Core NLP Understanding
Core NLP Primitives &
Operators
Provided by SystemT
[ACL '10, NAACL ‘18]
Semantic NLP Primitives
[NAACL-NLLP’19] Transparent Linguistic Models for Contract Understanding and Comparison https://www.ibm.com/cloud/compare-and-comply
Transparent Model Design
Purchaser will
purchase the Assets
by a cash payment.
Element Legal Domain LLEs
[Purchaser]ARG0
[will]TENSE-FUTURE
purchase
[the Assets]ARG1
[by a cash payment]ARGM-MNR
LLE1:
PREDICATE ∈ DICT Business-Transaction
∧ TENSE = Future
∧ POLARITY = Positive
→ NATURE = Obligation ∧ PARTY =
ARG0
LLE2:
…........
Domain Specific Concepts
Business transact. verbs
in future tense
with positive polarity
Core NLP Primitives &
Operators
Semantic NLP Primitives
[NAACL-NLLP’19] Transparent Linguistic Models for Contract Understanding and Comparison https://www.ibm.com/cloud/compare-and-comply
Transparent Model Design
Purchaser will
purchase the Assets
by a cash payment.
Element Model Output
[Purchaser]ARG0
[will]TENSE-FUTURE
purchase
[the Assets]ARG1
[by a cash payment]ARGM-MNR
Obligation for
Purchaser
Nature/Party:
Domain Specific Concepts
Core NLP Primitives &
Operators
LLE1:
PREDICATE ∈ DICT Business-Transaction
∧ TENSE = Future
∧ POLARITY = Positive
→ NATURE = Obligation ∧ PARTY =
ARG0
LLE2:
…........
Legal Domain LLEsSemantic NLP Primitives
[NAACL-NLLP’19] Transparent Linguistic Models for Contract Understanding and Comparison https://www.ibm.com/cloud/compare-and-comply
Human & Machine Co-Creation
Labeled
Data
Evaluati
on
Results
Productio
n
Deep
Learning
Learned Rules
(Explainable)
Modify Rules
Machine performs heavy lifting to abstract out patterns Humans verify/
transparent model
Evaluation & Deployment
Raises the abstraction level for domain experts to interact with
[EMNLP’20] Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification
Label being assigned
Various ways of
selecting/ranking
ranking rules
Center panel lists all rules
HEIDL Demo
Rule-specific performance
metrics
[ACL’19] HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop
HEIDL Demo
Examples available at the
click of a button
[ACL’19] HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop
Center panel lists all rules
HEIDL Demo
Playground mode allows
adding and dropping of
predicates from a rule
[ACL’19] HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop
User Study: Human+Machine
Co-Created Model
Performance
User study
– 4 NLP Engineers with 1-2 year experience
– 2 NLP experts with 10+ years experience
Key Takeaways
– Explanation of learned rrules: Visualization tool is very
effective
– Reduction in human labor: Co-created model created within
1.5 person-hrs outperforms black-box sentence classifier
– Lower requirement on human expertise: Co-created model is
at par with the model created by Super-Experts
Ua Ub Uc Ud
0.0
0.1
0.2
0.3
0.4
0.5
0.6
F-measure
RuleNN+Human
BiLSTM
[ACL’19] HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop
Conclusion
Research
prototype
Early adaption (EN)
Cross-lingual
adaptation
• Watson products
• Customer engagements
• Research projects …
• 10+ languages
• SoAT models
• Paper: 10+ publications
• Patent: 6 patent filed
• Data: ibm.biz/LanguageData
• Code: Chinese SOUNDEX https://pypi.org/project/chinesesoundex-1.0/
• ongoing
Thank You
52
Our collaborators in
• Within IBM
• Product: Watson NLP, Watson Discovery, Watson Health, CODAIT, …
• Research: AURL, IBMRA, BRL, DRL, HRL, IRL, TRL, YKT, ZRL
• Outside of IBM
• Allen AI Institute
• Humboldt University of Berlin
• IIT-Bombay
• NYU – Abu Dhabi
• Sapienza U. of Rome
• UCSD
• UIUC
• U. of Malta
• U. of Maryland, College Park
• U. of Michigan, Ann Arbor
• U. of Washington
• Vietnamese National U.
• …
Yunyao Li
Huaiyu Zhu
Kun Qian
Nancy Wang Fred Reiss
Yannis KatsisDoug Burdick
Ban Kawas
Lucian Popa
Ishan JindalPritthvi Sen
Marina DanilevskyKhoi-Nguyen Tran
Sairam Gurajada
Alexandre Evfimievski
Thank You!
53
To learn more:
• Role of AI in Enterprise Application ( ibm.biz/RoleOfAI)
Research Projects:
• ibm.biz/ScalableKnowledgeIntelligence
• ibm.biz/SystemT
Data Sets:
• ibm.biz/LanguageData
Follow me:
• LinkedIn: https://www.linkedin.com/in/yunyao-li/
• Twitter: @yunyao_li
By now, you should be able to:
– Identify challenges towards universal semantic
understanding of natural languages
– Understand current state-of-the-arts in
addressing the challenges
– Define general use cases for universal semantic
understanding of natural languages

More Related Content

What's hot

Programing paradigm &amp; implementation
Programing paradigm &amp; implementationPrograming paradigm &amp; implementation
Programing paradigm &amp; implementationBilal Maqbool ツ
 
From Programming to Modeling And Back Again
From Programming to Modeling And Back AgainFrom Programming to Modeling And Back Again
From Programming to Modeling And Back AgainMarkus Voelter
 
Ch1 language design issue
Ch1 language design issueCh1 language design issue
Ch1 language design issueJigisha Pandya
 
Architecting Domain-Specific Languages
Architecting Domain-Specific LanguagesArchitecting Domain-Specific Languages
Architecting Domain-Specific LanguagesMarkus Voelter
 
Programming Language Selection
Programming Language SelectionProgramming Language Selection
Programming Language SelectionDhananjay Nene
 
Introducing Language-Oriented Business Applications - Markus Voelter
Introducing Language-Oriented Business Applications - Markus VoelterIntroducing Language-Oriented Business Applications - Markus Voelter
Introducing Language-Oriented Business Applications - Markus VoelterJAXLondon2014
 
Software language over the last 50 years, what will be next (by Pieter Zulian...
Software language over the last 50 years, what will be next (by Pieter Zulian...Software language over the last 50 years, what will be next (by Pieter Zulian...
Software language over the last 50 years, what will be next (by Pieter Zulian...Verhaert Masters in Innovation
 
How to make Translation easier with Machines, by Hao Zong, Global Tone Commun...
How to make Translation easier with Machines, by Hao Zong, Global Tone Commun...How to make Translation easier with Machines, by Hao Zong, Global Tone Commun...
How to make Translation easier with Machines, by Hao Zong, Global Tone Commun...TAUS - The Language Data Network
 
Generic Tools - Specific Languages (PhD Defense Slides)
Generic Tools - Specific Languages (PhD Defense Slides)Generic Tools - Specific Languages (PhD Defense Slides)
Generic Tools - Specific Languages (PhD Defense Slides)Markus Voelter
 
Principles Of Programing Languages
Principles Of Programing LanguagesPrinciples Of Programing Languages
Principles Of Programing LanguagesMatthew McCullough
 
Principles of programming languages. Detail notes
Principles of programming languages. Detail notesPrinciples of programming languages. Detail notes
Principles of programming languages. Detail notesVIKAS SINGH BHADOURIA
 
Language-Oriented Business Applications
Language-Oriented Business ApplicationsLanguage-Oriented Business Applications
Language-Oriented Business ApplicationsMarkus Voelter
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...John Tinsley
 
Comparative Study of programming Languages
Comparative Study of programming LanguagesComparative Study of programming Languages
Comparative Study of programming LanguagesIshan Monga
 
CH # 1 preliminaries
CH # 1 preliminariesCH # 1 preliminaries
CH # 1 preliminariesMunawar Ahmed
 
Programming Languages An Intro
Programming Languages An IntroProgramming Languages An Intro
Programming Languages An IntroKimberly De Guzman
 

What's hot (20)

Programing paradigm &amp; implementation
Programing paradigm &amp; implementationPrograming paradigm &amp; implementation
Programing paradigm &amp; implementation
 
From Programming to Modeling And Back Again
From Programming to Modeling And Back AgainFrom Programming to Modeling And Back Again
From Programming to Modeling And Back Again
 
Ch1 language design issue
Ch1 language design issueCh1 language design issue
Ch1 language design issue
 
Paradigms
ParadigmsParadigms
Paradigms
 
Architecting Domain-Specific Languages
Architecting Domain-Specific LanguagesArchitecting Domain-Specific Languages
Architecting Domain-Specific Languages
 
Programming Language Selection
Programming Language SelectionProgramming Language Selection
Programming Language Selection
 
Introducing Language-Oriented Business Applications - Markus Voelter
Introducing Language-Oriented Business Applications - Markus VoelterIntroducing Language-Oriented Business Applications - Markus Voelter
Introducing Language-Oriented Business Applications - Markus Voelter
 
Software language over the last 50 years, what will be next (by Pieter Zulian...
Software language over the last 50 years, what will be next (by Pieter Zulian...Software language over the last 50 years, what will be next (by Pieter Zulian...
Software language over the last 50 years, what will be next (by Pieter Zulian...
 
How to make Translation easier with Machines, by Hao Zong, Global Tone Commun...
How to make Translation easier with Machines, by Hao Zong, Global Tone Commun...How to make Translation easier with Machines, by Hao Zong, Global Tone Commun...
How to make Translation easier with Machines, by Hao Zong, Global Tone Commun...
 
Generic Tools - Specific Languages (PhD Defense Slides)
Generic Tools - Specific Languages (PhD Defense Slides)Generic Tools - Specific Languages (PhD Defense Slides)
Generic Tools - Specific Languages (PhD Defense Slides)
 
Principles Of Programing Languages
Principles Of Programing LanguagesPrinciples Of Programing Languages
Principles Of Programing Languages
 
Principles of programming languages. Detail notes
Principles of programming languages. Detail notesPrinciples of programming languages. Detail notes
Principles of programming languages. Detail notes
 
Language-Oriented Business Applications
Language-Oriented Business ApplicationsLanguage-Oriented Business Applications
Language-Oriented Business Applications
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...
 
Comparative Study of programming Languages
Comparative Study of programming LanguagesComparative Study of programming Languages
Comparative Study of programming Languages
 
Plc part 1
Plc part 1Plc part 1
Plc part 1
 
Introduction to compilers
Introduction to compilersIntroduction to compilers
Introduction to compilers
 
CH # 1 preliminaries
CH # 1 preliminariesCH # 1 preliminaries
CH # 1 preliminaries
 
Programming Languages An Intro
Programming Languages An IntroProgramming Languages An Intro
Programming Languages An Intro
 
Algorithms - Introduction to computer programming
Algorithms - Introduction to computer programmingAlgorithms - Introduction to computer programming
Algorithms - Introduction to computer programming
 

Similar to Towards Universal Semantic Understanding of Natural Languages

24 Ways to Shut Down The Application and Other Apocryphal Stories
24 Ways to Shut Down The Application and Other Apocryphal Stories24 Ways to Shut Down The Application and Other Apocryphal Stories
24 Ways to Shut Down The Application and Other Apocryphal StoriesScott Abel
 
MVP Virtual Conference - Americas 2015 - Cross platform localization for mobi...
MVP Virtual Conference - Americas 2015 - Cross platform localization for mobi...MVP Virtual Conference - Americas 2015 - Cross platform localization for mobi...
MVP Virtual Conference - Americas 2015 - Cross platform localization for mobi...Christopher Miller
 
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Lucidworks
 
Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16Laura Dent
 
Internationalizing Your Apps
Internationalizing Your AppsInternationalizing Your Apps
Internationalizing Your AppsJohn Wilker
 
Embracing diversity searching over multiple languages
Embracing diversity  searching over multiple languagesEmbracing diversity  searching over multiple languages
Embracing diversity searching over multiple languagesSuneel Marthi
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Janifer Gatenby
 
Seven Steps to better Translations - A Beechwood Guide to Translation
Seven Steps to better Translations - A Beechwood Guide to TranslationSeven Steps to better Translations - A Beechwood Guide to Translation
Seven Steps to better Translations - A Beechwood Guide to TranslationBeechwood Creative Consultancy Ltd
 
Forum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentationForum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentationCELI
 
Preparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsPreparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsHPCC Systems
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenationAshwini Awatare
 
international PHP2011_Kore Nordmann_Designing multilingual applications
international PHP2011_Kore Nordmann_Designing multilingual applications international PHP2011_Kore Nordmann_Designing multilingual applications
international PHP2011_Kore Nordmann_Designing multilingual applications smueller_sandsmedia
 
Mightyverse women2 pitch
Mightyverse women2 pitchMightyverse women2 pitch
Mightyverse women2 pitchmightyverse
 
A11Y? I18N? L10N? UTF8? WTF? Understanding the connections between: accessib...
A11Y? I18N? L10N? UTF8? WTF? Understanding the connections between:  accessib...A11Y? I18N? L10N? UTF8? WTF? Understanding the connections between:  accessib...
A11Y? I18N? L10N? UTF8? WTF? Understanding the connections between: accessib...mtoppa
 

Similar to Towards Universal Semantic Understanding of Natural Languages (20)

24 Ways to Shut Down The Application and Other Apocryphal Stories
24 Ways to Shut Down The Application and Other Apocryphal Stories24 Ways to Shut Down The Application and Other Apocryphal Stories
24 Ways to Shut Down The Application and Other Apocryphal Stories
 
Swift vs. Language X
Swift vs. Language XSwift vs. Language X
Swift vs. Language X
 
MVP Virtual Conference - Americas 2015 - Cross platform localization for mobi...
MVP Virtual Conference - Americas 2015 - Cross platform localization for mobi...MVP Virtual Conference - Americas 2015 - Cross platform localization for mobi...
MVP Virtual Conference - Americas 2015 - Cross platform localization for mobi...
 
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
 
Language of Search
Language of SearchLanguage of Search
Language of Search
 
Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16
 
Internationalizing Your Apps
Internationalizing Your AppsInternationalizing Your Apps
Internationalizing Your Apps
 
Embracing diversity searching over multiple languages
Embracing diversity  searching over multiple languagesEmbracing diversity  searching over multiple languages
Embracing diversity searching over multiple languages
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08
 
Ch 5 ac wr in e
Ch 5 ac wr in eCh 5 ac wr in e
Ch 5 ac wr in e
 
Seven Steps to better Translations - A Beechwood Guide to Translation
Seven Steps to better Translations - A Beechwood Guide to TranslationSeven Steps to better Translations - A Beechwood Guide to Translation
Seven Steps to better Translations - A Beechwood Guide to Translation
 
Computer languages
Computer languagesComputer languages
Computer languages
 
Forum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentationForum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentation
 
Preparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsPreparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for Translations
 
Lecture # 1
Lecture # 1Lecture # 1
Lecture # 1
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Programming language design and implemenation
Programming language design and implemenationProgramming language design and implemenation
Programming language design and implemenation
 
international PHP2011_Kore Nordmann_Designing multilingual applications
international PHP2011_Kore Nordmann_Designing multilingual applications international PHP2011_Kore Nordmann_Designing multilingual applications
international PHP2011_Kore Nordmann_Designing multilingual applications
 
Mightyverse women2 pitch
Mightyverse women2 pitchMightyverse women2 pitch
Mightyverse women2 pitch
 
A11Y? I18N? L10N? UTF8? WTF? Understanding the connections between: accessib...
A11Y? I18N? L10N? UTF8? WTF? Understanding the connections between:  accessib...A11Y? I18N? L10N? UTF8? WTF? Understanding the connections between:  accessib...
A11Y? I18N? L10N? UTF8? WTF? Understanding the connections between: accessib...
 

More from Yunyao Li

The Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsYunyao Li
 
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-LoopBuilding, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-LoopYunyao Li
 
Meaning Representations for Natural Languages: Design, Models and Applications
Meaning Representations for Natural Languages:  Design, Models and ApplicationsMeaning Representations for Natural Languages:  Design, Models and Applications
Meaning Representations for Natural Languages: Design, Models and ApplicationsYunyao Li
 
Taming the Wild West of NLP
Taming the Wild West of NLPTaming the Wild West of NLP
Taming the Wild West of NLPYunyao Li
 
Towards Deep Table Understanding
Towards Deep Table UnderstandingTowards Deep Table Understanding
Towards Deep Table UnderstandingYunyao Li
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
Human in the Loop AI for Building Knowledge Bases
Human in the Loop AI for Building Knowledge Bases Human in the Loop AI for Building Knowledge Bases
Human in the Loop AI for Building Knowledge Bases Yunyao Li
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
An In-depth Analysis of the Effect of Text Normalization in Social Media
An In-depth Analysis of the Effect of Text Normalization in Social MediaAn In-depth Analysis of the Effect of Text Normalization in Social Media
An In-depth Analysis of the Effect of Text Normalization in Social MediaYunyao Li
 
Exploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active LearningExploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active LearningYunyao Li
 
K-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingK-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingYunyao Li
 
Coling poster
Coling posterColing poster
Coling posterYunyao Li
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Yunyao Li
 
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified LabelsPolyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified LabelsYunyao Li
 
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...Yunyao Li
 
The Power of Declarative Analytics
The Power of Declarative AnalyticsThe Power of Declarative Analytics
The Power of Declarative AnalyticsYunyao Li
 
Enterprise Search in the Big Data Era: Recent Developments and Open Challenges
Enterprise Search in the Big Data Era: Recent Developments and Open ChallengesEnterprise Search in the Big Data Era: Recent Developments and Open Challenges
Enterprise Search in the Big Data Era: Recent Developments and Open ChallengesYunyao Li
 
SystemT: Declarative Information Extraction
SystemT: Declarative Information ExtractionSystemT: Declarative Information Extraction
SystemT: Declarative Information ExtractionYunyao Li
 

More from Yunyao Li (20)

The Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
 
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-LoopBuilding, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
Building, Growing and Serving Large Knowledge Graphs with Human-in-the-Loop
 
Meaning Representations for Natural Languages: Design, Models and Applications
Meaning Representations for Natural Languages:  Design, Models and ApplicationsMeaning Representations for Natural Languages:  Design, Models and Applications
Meaning Representations for Natural Languages: Design, Models and Applications
 
Taming the Wild West of NLP
Taming the Wild West of NLPTaming the Wild West of NLP
Taming the Wild West of NLP
 
Towards Deep Table Understanding
Towards Deep Table UnderstandingTowards Deep Table Understanding
Towards Deep Table Understanding
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Human in the Loop AI for Building Knowledge Bases
Human in the Loop AI for Building Knowledge Bases Human in the Loop AI for Building Knowledge Bases
Human in the Loop AI for Building Knowledge Bases
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
An In-depth Analysis of the Effect of Text Normalization in Social Media
An In-depth Analysis of the Effect of Text Normalization in Social MediaAn In-depth Analysis of the Effect of Text Normalization in Social Media
An In-depth Analysis of the Effect of Text Normalization in Social Media
 
Exploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active LearningExploiting Structure in Representation of Named Entities using Active Learning
Exploiting Structure in Representation of Named Entities using Active Learning
 
K-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role LabelingK-SRL: Instance-based Learning for Semantic Role Labeling
K-SRL: Instance-based Learning for Semantic Role Labeling
 
Coling poster
Coling posterColing poster
Coling poster
 
Coling demo
Coling demoColing demo
Coling demo
 
Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...Natural Language Data Management and Interfaces: Recent Development and Open ...
Natural Language Data Management and Interfaces: Recent Development and Open ...
 
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified LabelsPolyglot: Multilingual Semantic Role Labeling with Unified Labels
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
 
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
 
The Power of Declarative Analytics
The Power of Declarative AnalyticsThe Power of Declarative Analytics
The Power of Declarative Analytics
 
Enterprise Search in the Big Data Era: Recent Developments and Open Challenges
Enterprise Search in the Big Data Era: Recent Developments and Open ChallengesEnterprise Search in the Big Data Era: Recent Developments and Open Challenges
Enterprise Search in the Big Data Era: Recent Developments and Open Challenges
 
SystemT: Declarative Information Extraction
SystemT: Declarative Information ExtractionSystemT: Declarative Information Extraction
SystemT: Declarative Information Extraction
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Towards Universal Semantic Understanding of Natural Languages

  • 1. Towards Universal Semantic Understanding of Natural Languages Yunyao Li (@yunyao_li) Senior Research Manager Scalable Knowledge Intelligence IBM Research - Almaden
  • 3. 3 7,102 known languages 23 most spoken language 4.1+ Billion people Source: https://www.iflscience.com/environment/worlds-most-spoken-languages-and-where-they-are-spoken/
  • 4. 4 Asia-Pacific Region > 3,200 Languages 28 Major language families Source: https://reliefweb.int/sites/reliefweb.int/files/resources/OCHA_ROAP_Language_v6_110519.pdf
  • 5. Conventional Approach towards Language Enablement 5 English Text English NLU English Applications German Text German NLU German Applications Chinese Text Chinese NLU Chinese Applications Separate NLU pipeline for each language Separate application for each language
  • 6. Universal Semantic Understanding of Natural Languages 6 English Text German Text Universal NLU Cross-lingual Applications Chinese Text Single NLU pipeline for different languages Develop once for different language
  • 7. The Challenges 7 Models – Low-frequency exceptions – Built for one task at a time Training Data – High quality labeled data is required but hard to obtain Meaning Representation – Different meaning representation • for different languages • for the same languages - Data: Auto-generation + crowd- in-the-loop [ACL’15, EMNLP’16, EMNLP’17, EMNLP’20 Findings] - Training: Cross-Lingual transfer [EMNLP’20 Findings] Unified Meaning Representation [ACL’15, ACL’16, ACL-DMR’19] – Instance-based learning [COLING’16] – Deep learning + instance-based learning [In Submission] – Human-machine co-creation [ACL’19, EMNLP’20] Our Research 7
  • 8. The Challenges 8 Models – Low-frequency exceptions – Built for one task at a time Training Data – High quality labeled data is required but hard to obtain Meaning Representation – Different meaning representation • for different languages • for the same languages - Data: Auto-generation + crowd- in-the-loop [ACL’15, EMNLP’16, EMNLP’17, EMNLP’20 Findings] - Training: Cross-Lingual transfer [EMNLP’20 Findings] Unified Meaning Representation [ACL’15, ACL’16, ACL-DMR’19] – Instance-based learning [COLING’16] – Deep learning + instance-based learning [In Submission] – Human-machine co-creation [ACL’19, EMNLP’20] Our Research 8
  • 9. John hastily ordered a dozen dandelions for Mary from Amazon’s Flower Shop. order.02 (request to be delivered) A0: Orderer A1: Thing ordered A2: Benefactive, ordered-for A3: Source A0: Orderer A1: Thing ordered A2: Benefactive, ordered-for A3: SourceAM-MNR: Manner WHO HOW DID WHAT WHERE Semantic Role Labeling (SRL) FOR WHOM Who did what to whom, when, where and how?
  • 10. Dirk broke the window with a hammer. Break.01A0 A1 A2 The window was broken by Dirk. The window broke. A1 Break.01 A0 A1 Break.01 Break.01 A0 – Breaker A1 – Thing broken A2 – Instrument A3 – Pieces Break.15 A0 – Journalist, exposer A1 – Story, thing exposed Syntax vs. Semantic Parsing What type of labels are valid across languages? • Lexical, morphological and syntactic labels differ greatly • Shallow semantic labels remain stable
  • 11. SRL Resources Other languages • Chinese Proposition Bank • Hindi Proposition Bank • German FrameNet • French? Spanish? Russian? Arabic? … English • FrameNet • PropBank 1. Limited coverage 2. Language-specific formalisms 订购 A0: buyer A1: commodity A2: seller order.02 A0: orderer A1: thing ordered A2: benefactive, ordered-for A3: source We want different languages to share the same semantic labels
  • 12. WhatsApp was bought by Facebook Facebook hat WhatsApp gekauft Facebook a achété WhatsApp buy.01 Facebook WhatsApp Buyer Thing bought Cross-lingual representationMultilingual input text Buy.01 A0A1 Buy.01A1A0 Buy.01A0 A1 Shared Frames Across Languages A0 A1
  • 13. The Challenges 13 Models – Low-frequency exceptions – Built for one task at a time Training Data – High quality labeled data is required but hard to obtain Meaning Representation – Different meaning representation • for different languages • for the same languages - Data: Auto-generation + crowd- in-the-loop [ACL’15, EMNLP’16, EMNLP’17, EMNLP’20 Findings] - Training: Cross-Lingual transfer [EMNLP’20 Findings] Unified Meaning Representation [ACL’15, ACL’16, ACL-DMR’19] – Instance-based learning [COLING’16] – Deep learning + instance-based learning [In Submission] – Human-machine co-creation [ACL’19, EMNLP’20] Our Research 13
  • 14. Generate SRL resources for many other languages • Shared frame set • Minimal effort Il faut qu‘ il y ait des responsables Need.01A0 Je suis responsable pour le chaos Be.01A1 A2 AM-PRD Les services postaux ont achété des … Be.01 A2A1 Buy.01A0 Corpus of annotated text data Universal Proposition Banks Frame set Buy.01 A0 – Buyer A1 – Thing bought A2 – Seller A3 – Price paid A4 – Benefactive Pay.01 A0 – Payer A1 – Money A2 – Being payed A3 – Commodity
  • 16. Example: TV subtitles Our Idea: Annotation projection with parallel corpora Das würde ich für einen Dollar kaufen German subtitles I would buy that for a dollar! English subtitles PRICEBUYER ITEM BUYERITEM Training data • Semantically annotated • Multilingual • Large amount I would buy that for a dollar PRICE projection Das würde ich für einen Dollar kaufen Auto-Generation of Universal Preposition Bank 16 Resource: https://www.youtube.com/watch?v=u5HOt0ZOcYk
  • 17. We need to hold people responsible Il faut qu‘ il y ait des responsables English sentence: Target sentence: Hold.01A0 A1 A3Need.01 Hold.01 Incorrect projection! There need to be those responsible A1 Main error sources: • Translation shift • Source-language SRL errors However: Projections Not Always Possible
  • 18. Filtered Projection & Bootstrapping Two-step process – Filters to detect translation shift, block projections (more precision at cost of recall) – Bootstrap learning to increase recall – Generated 7 universal proposition banks from 3 language groups • Version 1.0: https://github.com/System- T/UniversalPropositions/ • Version 2.0 coming soon [ACL’15] Generating High Quality Proposition Banks for Multilingual Semantic Role Labeling.
  • 19. Multilingual Aliasing • Problem: Target language frame lexicon automatically generated from alignments – False frames – Redundant frames • Expert curation of frame mappings [COLING’16] Multilingual Aliasing for Auto-Generating Proposition Banks
  • 20. Low-resource Languages Apply approach to low-resource languages Bengali, Malayalam, Tamil – Fewer sources of parallel data – Almost no NLP: No syntactic parsing, lemmatization etc. Crowdsourcing for data curation [EMNLP’16] Towards Semi-Automatic Generation of Proposition Banks for Low- Resource Languages
  • 21. Annotation Tasks (all) Task Routerraw text Corpus predicted annotations Corpus curated annotations Corpus Easy tasks are curated by crowd Difficult tasks are curated by experts Crowd-in-the-Loop Curation [EMNLP’17] CROWD-IN-THE-LOOP: A Hybrid Approach for Annotating Semantic Roles
  • 23. ­9pp F1 improvement over SRL results Effectiveness of Crowd-in- the-Loop ¯66.4pp expert efforts ­10pp F1 improvement over SRL results ¯87.3pp expert efforts Latest: Filter à Select à Expert [Findings of EMNLP’20] A Novel Workflow for Accurately and Efficiently Crowdsourcing Predicate Senses and Argument Labels
  • 24. The Challenges 24 Models – Low-frequency exceptions – Built for one task at a time Training Data – High quality labeled data is required but hard to obtain Meaning Representation – Different meaning representation • for different languages • for the same languages - Data: Auto-generation + crowd- in-the-loop [ACL’15, EMNLP’16, EMNLP’17, EMNLP’20 Findings] - Training: Cross-Lingual transfer [EMNLP’20 Findings] Unified Meaning Representation [ACL’15, ACL’16, ACL-DMR’19] – Instance-based learning [COLING’16] – Deep learning + instance-based learning [In Submission] – Human-machine co-creation [ACL’19, EMNLP’20] Our Research 24
  • 25. WhatsApp was bought by Facebook Facebook hat WhatsApp gekauft Facebook a achété WhatsApp buy.01 Facebook WhatsApp Buyer Thing bought Cross-lingual representation Multilingual input text Buy.01 A0A1 Buy.01A1A0 Buy.01A0 A1 Cross-lingual Meaning Representation Cross-lingual extraction Task: Extract who bought what [NAACL’18] SystemT: Declarative Text Understanding for Enterprise [ACL’16] POLYGLOT: Multilingual Semantic Role Labeling with Unified Labels [COLING’16] Multilingual Information Extraction with PolyglotIE
  • 26. Cross-lingual Transfer? Challenge: Low-resource languages lacks - Large monolingual labeled data - Parallel corpora Solution: Transfer knowledge and resources from rich resource language to low resource language EN DE YO . . .
  • 27. Multilingual or Polyglot Training Main Idea • Combine training data from multiple languages with multilingual word embeddings • Train a common encoder model to enable parameter sharing. Challenge Different languages have different annotations scheme EN DE YO . . .
  • 28. Different Annotations across Languages Observation: Certain argument labels do share common semantic meaning across languages. Intuition: Identify and exploit the commonalities between annotation of different languages. Know.01 A0: Knower A1: Thing known A2: A1 known about AM: Adjuncts Knnen.01 A0: Knower A1: Entity AM: Adjuncts
  • 29. Hypothesis Pair Matching: Identify arguments with similar semantic meaning across languages and Source Manifold ZH-A0 A0 AM-TMP ZH-TMP Target Manifold 1 2 Argument Regularization Represent them close to each other in the feature space.
  • 30. The Framework Regularizer is applied at parameters of the last layer of the model. .… .… .… .… .… .… .… .… .… .… .. .. .. Softmax .. .. .. .. .. .. .. .. .. .. BiLSTM BiLSTM BiLSTM BiLSTM Encoder Word Representations Fixed Sentence Representation vi<latexit sha1_base64="PUnq/hPkljd9ESWrB39UNwUjUEE=">AAAE5nicdVPLbhMxFHXbAGV4tSCxYWMxjcQiijIpFSwrihBsUJHSh5RJI4/HSax4HvIjyWD8C+wQW8QWPoBf4W/wPJQmk3Klka7OPec+x0HKqJCdzt+t7Z3Grdt3du869+4/ePhob//xuUgUx+QMJyzhlwEShNGYnEkqGblMOUFRwMhFMD3J4xczwgVN4p7MUjKI0DimI4qRtNBw7+mBL8lCFok0J6HRsyE1B8M9t9PuFAY3Ha9yXFDZ6XB/548fJlhFJJaYISH6XieVA424pJgR4/hKkBThKRqTvnVjFBEx0EVdA5sWCeEo4faLJSzQVYVGkRBZFFhmhORE1GM5eFOsr+To9UDTOFWSxLgsNFIMygTm24Ah5QRLllkHYU5trxBPEEdY2p2tZQpnNBVV14uy7bUuFiniYn1OLVGgWohzlLUixSTlybw15iidULyoqYX1SCsaUS6kwq1cyRDPWuUuZMBqmen08zoSJMnUqoRxHP8jmb+tbnGSRBGKQ+0XeWJlb6GE0RYlenhljG46EPoBGdN4zBOVljxGpMaK83JO3TYF68PoHDFFeu+06xkLQH397/Rdb6DdrqnjSwwWlay5h00oJwRWvZTgMrd2Xxo9XE2w0oeNGdOEQgXCHiuVG9ojo6/+pz2qtKm97LW6nIzEYTG9Y7fn+DGZ4+XaGA9M3862vkEd2FvGJl8EtPPVJfObJPMJlWQpWdf0jGUHCQvzvzxhsLfByGqMLC/LySpF1ShqkzKrUWaWYt+6V3/Zm855t+0dtrufuu7xm+rV74Jn4Dl4ATzwChyD9+AUnAEMvoCf4Bf43Zg0vja+Nb6X1O2tSvMErFnjxz+GMq5d</latexit> ui<latexit sha1_base64="SV6pSr2q35hT+K6jX2dqCFXuEGo=">AAAE6HicdVPLbtNAFJ22AYp5pbBBYjPCjcQiiuKUCpYVRQg2qEjpQ4rTaDyZJEPGD80jiRkN38AOsUVsYc2v8DdMbJPGTrmSpatzz7lPT5AwKmS7/Wdre6d24+at3dvOnbv37j+o7z08E7HimJzimMX8IkCCMBqRU0klIxcJJygMGDkPpsfL+PmMcEHjqCvThPRDNI7oiGIkLTSoP973JVnILJEOeDyPjFYDavYHdbfdamcGNx2vcFxQ2Mlgb+e3P4yxCkkkMUNC9Lx2IvsacUkxI8bxlSAJwlM0Jj3rRigkoq+zygY2LDKEo5jbL5IwQ9cVGoVCpGFgmSGSE1GNLcHrYj0lRy/7mkaJkiTCeaGRYlDGcLkPOKScYMlS6yDMqe0V4gniCEu7tVKm4Ywmouh6kbdd6mKRIC7Kc2qJAtVEnKO0GSomqV1wc8xRMqF4UVEL65FmOKJcSIWbSyVDPG3mu5ABq2Sm009lJIjjqVUJ4zj+ezJ/XdziOA5DFA21n+WJlL2FEkZblOjBpTG64UDoB2RMozGPVZLzGJEaK87zOXXLZKx3ozPEFOm+0a5nLAD11d/Tc72+djumiq8wmFWy5h40oJwQWPSSg6vc2n1u9GA9wVofNmZMAwoVCHusRG5oD42+/J/2sNAm9rJX6nwyEg2z6R27PcePyByv1sZ4YHp2tvIG/z0Wuwho56tK5tdJ5hMqyUpS1nSNZQcxGy7/8pjB7gYjrTDSZVlO1imqQlGblFmFMrMU+9a96svedM46Le+g1fnQcY9eFa9+FzwBT8Ez4IEX4Ai8BSfgFGDwGfwAP8Gv2sfal9rX2recur1VaB6BktW+/wX8W69d</latexit> <latexit sha1_base64="sWOp4y3+vzY+IXL+chq8peFGJow=">AAAE73icdVNJbxMxFHbbACVsKRw5YDGNxCFEmZQKjhVFCC6oSOkiZdLI43ESK54FL1mwfOQ3cENcEVeQ+Cv8GzyL0sykPMnS0/e+7/kttp8wKmSn83dre6d24+at3dv1O3fv3X/Q2Ht4JmLFMTnFMYv5hY8EYTQip5JKRi4STlDoM3LuT4/T+PmMcEHjqCeXCRmEaBzREcVIWmjYeLLvSbKQWSLtM0WMfu6dCAo9TDk2+8OG02l3MoObjls4DijsZLi388cLYqxCEknMkBB9t5PIgUZcUsyIqXtKkAThKRqTvnUjFBIx0FkBBjYtEsBRzO2JJMzQdYVGoRDL0LfMEMmJqMZS8LpYX8nRq4GmUaIkiXB+0UgxKGOYjgUGlBMs2dI6CHNqa4V4gjjC0g6vlCmY0UQUVS/ysktVLBLERblPLZGvWohztGyFiknK43lrzFEyoXhRUQvrkVY4olxIhVupkiG+bOWzkD6rZKbTz2XEj+OpVQlTr3sfyPxNsYvjOAxRFGgvyxMpuwsljLYo0cNLY3SzDqHnkzGNxjxWSc5jRGqsOM/71G2Tsd6PzpB9Kr232nGNBaC+ekR9xx1op2uq+AqD2U3WnIMmlBMCi1pycJVbOy+MHq4nWKvDxoxpQqF8YZeVyA3todGX/9MeFtrEbvZKnXdGoiDrvm6nV/ciMsersTHum77trTxB7dtdRiYdBLT9VSXz6yTzCZVkJSlresay/ZgF6SuPGextMJYVxjK9lpN1iqpQ1CZlVqHMLMX+dbf6szeds27bPWh3P3ado9fFr98Fj8FT8Ay44CU4Au/ACTgFGHwBP8Ev8Lv2qfa19q32PadubxWaR6BktR//AI/psbg=</latexit> +b<latexit sha1_base64="xTelgJN5oYRxTa8991wB9lSAa5w=">AAAE53icdVPLbhMxFHXbACW8WliwYGMxjYREFGVSKlhWFCHYoCKlDymTRrbjJFY8D/mRZLD8DewQW8QW9vwKf4MzM0ozk2LJ0tW559ynjRPOpGq3/25t79Ru3b6ze7d+7/6Dh4/29h+fy1gLQs9IzGNxiZGknEX0TDHF6WUiKAoxpxd4erL0X8yokCyOuipNaD9E44iNGEHKQYO9pweBoguVBTKYa2rNS4jtwWDPa7fa2YGbhl8YHijO6WB/508wjIkOaaQIR1L2/Hai+gYJxQinth5oSRNEpmhMe86MUEhl32SJLWw4ZAhHsXA3UjBD1xUGhVKmIXbMEKmJrPqW4E2+nlajN33DokQrGpE80UhzqGK4HAccMkGJ4qkzEBHM1QrJBAlElBtaKdJwxhJZVL3Iyy5VsUiQkOU+jUJYN5EQKG2Gmism4nlzLFAyYWRRUUtn0WY4YkIqTZpLJUcibeazUJhXIrPplzKC43jqVNLW68EnOn9X7OIkDkMUDU2QxYm024WW1jiUmsGVtaZRhzDAdMyisYh1kvM4VYZoIfI+TctmrI+jc+SeSPe98XzrAGiuH0/P8/vG69gqvsJglskd77AB1YTCopYcXMU23itrBusB1upwPmsbUGos3bIStaE9subqf9qjQpu4zV6r885oNMy6r7vp1YOIzslqbFxg23O9lSdosNtlZJeDgK6/qmR+k2Q+YYquJGVN1zo2jvlw+cpjDrsbjLTCSJdpBV2n6ApFb1JmFcrMUdxf96s/e9M477T8w1bnc8c7flv8+l3wDDwHL4APXoNj8AGcgjNAgAU/wS/wu8ZqX2vfat9z6vZWoXkCSqf24x8WR65D</latexit> w1<latexit sha1_base64="XqunA7ZMLIHdlA4ywRCiI90cTAs=">AAAE6HicdVPLbtNAFJ22AYp5pbBBYjPCjcQiiuKUCpYVRQg2qEjpQ4rTaDyZJEPGD80jiRkN38AOsUVsYc2v8DdMbJPGTrmSpatzz7lPT5AwKmS7/Wdre6d24+at3dvOnbv37j+o7z08E7HimJzimMX8IkCCMBqRU0klIxcJJygMGDkPpsfL+PmMcEHjqCvThPRDNI7oiGIkLTSoP973JVnILJEOeDyPjJ4PPLM/qLvtVjszuOl4heOCwk4Gezu//WGMVUgiiRkSoue1E9nXiEuKGTGOrwRJEJ6iMelZN0IhEX2dVTawYZEhHMXcfpGEGbqu0CgUIg0DywyRnIhqbAleF+spOXrZ1zRKlCQRzguNFIMyhst9wCHlBEuWWgdhTm2vEE8QR1jarZUyDWc0EUXXi7ztUheLBHFRnlNLFKgm4hylzVAxSe2Cm2OOkgnFi4paWI80wxHlQircXCoZ4mkz34UMWCUznX4qI0EcT61KGMfx35P56+IWx3EYomio/SxPpOwtlDDaokQPLo3RDQdCPyBjGo15rJKcx4jUWHGez6lbJmO9G50hpkj3jXY9YwGor/6enuv1tdsxVXyFwaySNfegAeWEwKKXHFzl1u5zowfrCdb6sDFjGlCoQNhjJXJDe2j05f+0h4U2sZe9UueTkWiYTe/Y7Tl+ROZ4tTbGA9Ozs5U3+O+x2EVAO19VMr9OMp9QSVaSsqZrLDuI2XD5l8cMdjcYaYWRLstysk5RFYrapMwqlJml2LfuVV/2pnPWaXkHrc6Hjnv0qnj1u+AJeAqeAQ+8AEfgLTgBpwCDz+AH+Al+1T7WvtS+1r7l1O2tQvMIlKz2/S8NYa8n</latexit> w2<latexit sha1_base64="m8LhNnvHy6BEbsZ2bRCfBKMuMVc=">AAAE6HicdVPLbtNAFJ22AYp5pbBBYjPCjcQiiuKUCpYVRQg2qEjpQ4rTaDyZJEPGD80jiRkN38AOsUVsYc2v8DdMbJPGTrmSpatzz7lPT5AwKmS7/Wdre6d24+at3dvOnbv37j+o7z08E7HimJzimMX8IkCCMBqRU0klIxcJJygMGDkPpsfL+PmMcEHjqCvThPRDNI7oiGIkLTSoP973JVnILJEOeDyPjJ4POmZ/UHfbrXZmcNPxCscFhZ0M9nZ++8MYq5BEEjMkRM9rJ7KvEZcUM2IcXwmSIDxFY9KzboRCIvo6q2xgwyJDOIq5/SIJM3RdoVEoRBoGlhkiORHV2BK8LtZTcvSyr2mUKEkinBcaKQZlDJf7gEPKCZYstQ7CnNpeIZ4gjrC0WytlGs5oIoquF3nbpS4WCeKiPKeWKFBNxDlKm6FiktoFN8ccJROKFxW1sB5phiPKhVS4uVQyxNNmvgsZsEpmOv1URoI4nlqVMI7jvyfz18UtjuMwRNFQ+1meSNlbKGG0RYkeXBqjGw6EfkDGNBrzWCU5jxGpseI8n1O3TMZ6NzpDTJHuG+16xgJQX/09Pdfra7djqvgKg1kla+5BA8oJgUUvObjKrd3nRg/WE6z1YWPGNKBQgbDHSuSG9tDoy/9pDwttYi97pc4nI9Ewm96x23P8iMzxam2MB6ZnZytv8N9jsYuAdr6qZH6dZD6hkqwkZU3XWHYQs+HyL48Z7G4w0gojXZblZJ2iKhS1SZlVKDNLsW/dq77sTees0/IOWp0PHffoVfHqd8ET8BQ8Ax54AY7AW3ACTgEGn8EP8BP8qn2sfal9rX3LqdtbheYRKFnt+18Rzq8o</latexit> wn<latexit sha1_base64="hURgb79n4ej3kCCEE9z8zawwLMw=">AAAE6HicdVPLbtNAFJ22AYp5pbBBYjPCjcQiiuKUCpYVRQg2qEjpQ0rSaDyZJEPGY2seScxo+AZ2iC1iC2t+hb9hYps0ccqVLF2de859esKEUamazT9b2zuVGzdv7d727ty9d/9Bde/hmYy1wOQUxywWFyGShFFOThVVjFwkgqAoZOQ8nBwv4udTIiSNeVulCelFaMTpkGKkHNSvPt7vKjJXWSITinjGrZn1ud3vV/1mo5kZ3HSCwvFBYSf9vZ3f3UGMdUS4wgxJ2QmaieoZJBTFjFivqyVJEJ6gEek4l6OIyJ7JKltYc8gADmPhPq5ghq4qDIqkTKPQMSOkxrIcW4DXxTpaDV/2DOWJVoTjvNBQM6hiuNgHHFBBsGKpcxAW1PUK8RgJhJXb2lqmwZQmsuh6nre91sU8QUKuz2kUCnUdCYHSeqSZom7B9ZFAyZjieUktnUfq0ZAKqTSuL5QMibSe70KFrJSZTj6tI2EcT5xKWs/rviez18UtjuMoQnxgulkert0ttLTGocT0L601NQ/CbkhGlI9ErJOcx4gyWAuRz2kaNmO9G54hpkn7jfED6wBorv6ejh/0jN+yZXyJwaySM/+gBtWYwKKXHFzmNv5za/qrCVb6cDFra1DqULpjJWpDe2jN5f+0h4U2cZe9UueTET7Ipvfc9rwuJzO8XBsToe242dY3+O+xuEVAN19ZMrtOMhtTRZaSdU3bOnYYs8HiL48ZbG8w0hIjXZQVZJWiSxS9SZmWKFNHcW89KL/sTees1QgOGq0PLf/oVfHqd8ET8BQ8AwF4AY7AW3ACTgEGn8EP8BP8qnysfKl8rXzLqdtbheYRWLPK978baa9k</latexit>
  • 31. CLAR Performance Dataset: CoNLL2009 Our is SoTA - Average performance over all languages - 3 out of 5 non-English languages- General approach: - Independent of base model. - Independent of language. - Require no parallel data.
  • 32. The Challenges 32 Models – Low-frequency exceptions – Built for one task at a time Training Data – High quality labeled data is required but hard to obtain Meaning Representation – Different meaning representation • for different languages • for the same languages - Data: Auto-generation + crowd- in-the-loop [ACL’15, EMNLP’16, EMNLP’17, EMNLP’20 Findings] - Training: Cross-Lingual transfer [EMNLP’20 Findings] Unified Meaning Representation [ACL’15, ACL’16, ACL-DMR’19] – Instance-based learning [COLING’16] – Deep learning + instance-based learning [In Submission] – Human-machine co-creation [ACL’19, EMNLP’20] Our Research 32
  • 33. Dependency Parsing Vs. SRL 75 80 85 90 95 100 WSJ BROWN SRL Depeendency Parsing
  • 34. What Makes SRL So Difficult? Heavy-tailed distribution of class labels – Common frames • say.01 (8243), have.01 (2040), sell.01 (1009) – Many uncommon frames • swindle.01, feed.01, hum.01, toast.01 – Almost half of all frames seen fewer than 3 times in training data 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 Distribution of frame labels Many low-frequency exceptions à Difficult to capture in models
  • 35. Low-Frequency Exceptions Strong correlation of syntactic function of an argument to its role Example: passive subject The window was broken by Dirk SBJ PMOD VC NMOD A1 The silver was sold by the man. SBJ PMOD VC NMOD A1 Creditors were told to hold off. SBJ ORPD VC IM PRT TELL.01 A0: speaker (agent) A1: utterance (topic) A2: hearer (recipient)
  • 36. 86% of passive subjects are labeled A1 (over 4.000x in training data) Local Bias 87% of passive subjects of Tell.01 are labeled A2 (53x in training data) Most Classifiers – Bag-of-features – Learn weights for features to classes – Perform generalization Question: How do we explicitly capture low-frequency exceptions?
  • 37. Instance-based Learning kNN: k-Nearest Neighbors classification Find the k most similar instances in training data Derive class label from nearest neighbors A0 A1 A1 A2 A1 A1 A1 A1 A1 A0 A0 A1 A0 A2 A2 A2 A2 A1 A2 ? 1 2 3 ndistance Creditors were told to hold off. SBJ ORPD VC IM PRT “creditor” passive subject of TELL.01 noun passive subject of TELL.01 COMPOSITE FEATURE DISTANCE 1 2 . . . . . . any passive subject of any agentive verb n ? Main idea: Back off to composite feature seen at least k times [COLING 2016] K-SRL: Instance-based Learning for Semantic Role Labeling
  • 38. Results In-domain Out-of-domain • Significantly outperform previous approaches – Especially on out-of-domain data • Small neighborhoods suffice (k=3) • Fast runtime ­1.4pp F1 In-Domain ­5.1pp F1 Out-of-Domain Latest results (improvement over SoAT. with DL + IL, in submission) [In Submission] Deep learning + Instance-based Learning [COLING 2016] K-SRL: Instance-based Learning for Semantic Role Labeling
  • 39. The Challenges 39 Models – Low-frequency exceptions – Built for one task at a time Training Data – High quality labeled data is required but hard to obtain Meaning Representation – Different meaning representation • for different languages • for the same languages - Data: Auto-generation + crowd- in-the-loop [ACL’15, EMNLP’16, EMNLP’17, EMNLP’20 Findings] - Training: Cross-Lingual transfer [EMNLP’20 Findings] Unified Meaning Representation [ACL’15, ACL’16, ACL-DMR’19] – Instance-based learning [COLING’16] – Deep learning + instance-based learning [In Submission] – Human-machine co-creation [ACL’19, EMNLP’20] Our Research 39
  • 40. WhatsApp was bought by Facebook Facebook hat WhatsApp gekauft Facebook a achété WhatsApp buy.01 Facebook WhatsApp Buyer Thing bought Cross-lingual representation Multilingual input text Buy.01 A0A1 Buy.01A1A0 Buy.01A0 A1 Crosslingual Information Extraction Sentence Verb Buyer Thing bought 1 buy.01 Facebook WhatsApp 2 buy.01 Facebook WhatsApp 3 buy.01 Facebook WhatsApp Crosslingual extraction Task: Extract who bought what [NAACL’18] SystemT: Declarative Text Understanding for Enterprise [ACL’16] POLYGLOT: Multilingual Semantic Role Labeling with Unified Labels [COLING’16] Multilingual Information Extraction with PolyglotIE https://vimeo.com/180382223
  • 41. Transparent Linguistic Models for Contract Understanding 41 [NAACL-NLLP’19] Transparent Linguistic Models for Contract Understanding and Comparison https://www.ibm.com/cloud/compare-and-comply
  • 42. Transparent Model Design Purchaser will purchase the Assets by a cash payment. Element Obligation for Purchaser [NAACL-NLLP’19] Transparent Linguistic Models for Contract Understanding and Comparison https://www.ibm.com/cloud/compare-and-comply
  • 43. Transparent Model Design Purchaser will purchase the Assets by a cash payment. Element [Purchaser]A0 [will]TENSE-FUTURE purchase [the Assets]A1 [by a cash payment]ARGM-MNR Core NLP Understanding Core NLP Primitives & Operators Provided by SystemT [ACL '10, NAACL ‘18] Semantic NLP Primitives [NAACL-NLLP’19] Transparent Linguistic Models for Contract Understanding and Comparison https://www.ibm.com/cloud/compare-and-comply
  • 44. Transparent Model Design Purchaser will purchase the Assets by a cash payment. Element Legal Domain LLEs [Purchaser]ARG0 [will]TENSE-FUTURE purchase [the Assets]ARG1 [by a cash payment]ARGM-MNR LLE1: PREDICATE ∈ DICT Business-Transaction ∧ TENSE = Future ∧ POLARITY = Positive → NATURE = Obligation ∧ PARTY = ARG0 LLE2: …........ Domain Specific Concepts Business transact. verbs in future tense with positive polarity Core NLP Primitives & Operators Semantic NLP Primitives [NAACL-NLLP’19] Transparent Linguistic Models for Contract Understanding and Comparison https://www.ibm.com/cloud/compare-and-comply
  • 45. Transparent Model Design Purchaser will purchase the Assets by a cash payment. Element Model Output [Purchaser]ARG0 [will]TENSE-FUTURE purchase [the Assets]ARG1 [by a cash payment]ARGM-MNR Obligation for Purchaser Nature/Party: Domain Specific Concepts Core NLP Primitives & Operators LLE1: PREDICATE ∈ DICT Business-Transaction ∧ TENSE = Future ∧ POLARITY = Positive → NATURE = Obligation ∧ PARTY = ARG0 LLE2: …........ Legal Domain LLEsSemantic NLP Primitives [NAACL-NLLP’19] Transparent Linguistic Models for Contract Understanding and Comparison https://www.ibm.com/cloud/compare-and-comply
  • 46. Human & Machine Co-Creation Labeled Data Evaluati on Results Productio n Deep Learning Learned Rules (Explainable) Modify Rules Machine performs heavy lifting to abstract out patterns Humans verify/ transparent model Evaluation & Deployment Raises the abstraction level for domain experts to interact with [EMNLP’20] Learning Explainable Linguistic Expressions with Neural Inductive Logic Programming for Sentence Classification
  • 47. Label being assigned Various ways of selecting/ranking ranking rules Center panel lists all rules HEIDL Demo Rule-specific performance metrics [ACL’19] HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop
  • 48. HEIDL Demo Examples available at the click of a button [ACL’19] HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop
  • 49. Center panel lists all rules HEIDL Demo Playground mode allows adding and dropping of predicates from a rule [ACL’19] HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop
  • 50. User Study: Human+Machine Co-Created Model Performance User study – 4 NLP Engineers with 1-2 year experience – 2 NLP experts with 10+ years experience Key Takeaways – Explanation of learned rrules: Visualization tool is very effective – Reduction in human labor: Co-created model created within 1.5 person-hrs outperforms black-box sentence classifier – Lower requirement on human expertise: Co-created model is at par with the model created by Super-Experts Ua Ub Uc Ud 0.0 0.1 0.2 0.3 0.4 0.5 0.6 F-measure RuleNN+Human BiLSTM [ACL’19] HEIDL: Learning Linguistic Expressions with Deep Learning and Human-in-the-Loop
  • 51. Conclusion Research prototype Early adaption (EN) Cross-lingual adaptation • Watson products • Customer engagements • Research projects … • 10+ languages • SoAT models • Paper: 10+ publications • Patent: 6 patent filed • Data: ibm.biz/LanguageData • Code: Chinese SOUNDEX https://pypi.org/project/chinesesoundex-1.0/ • ongoing
  • 52. Thank You 52 Our collaborators in • Within IBM • Product: Watson NLP, Watson Discovery, Watson Health, CODAIT, … • Research: AURL, IBMRA, BRL, DRL, HRL, IRL, TRL, YKT, ZRL • Outside of IBM • Allen AI Institute • Humboldt University of Berlin • IIT-Bombay • NYU – Abu Dhabi • Sapienza U. of Rome • UCSD • UIUC • U. of Malta • U. of Maryland, College Park • U. of Michigan, Ann Arbor • U. of Washington • Vietnamese National U. • … Yunyao Li Huaiyu Zhu Kun Qian Nancy Wang Fred Reiss Yannis KatsisDoug Burdick Ban Kawas Lucian Popa Ishan JindalPritthvi Sen Marina DanilevskyKhoi-Nguyen Tran Sairam Gurajada Alexandre Evfimievski
  • 53. Thank You! 53 To learn more: • Role of AI in Enterprise Application ( ibm.biz/RoleOfAI) Research Projects: • ibm.biz/ScalableKnowledgeIntelligence • ibm.biz/SystemT Data Sets: • ibm.biz/LanguageData Follow me: • LinkedIn: https://www.linkedin.com/in/yunyao-li/ • Twitter: @yunyao_li By now, you should be able to: – Identify challenges towards universal semantic understanding of natural languages – Understand current state-of-the-arts in addressing the challenges – Define general use cases for universal semantic understanding of natural languages