SlideShare a Scribd company logo
SOFTCARDINALITY:
Learning to Identify Directional Cross-Lingual
Entailment from Cardinalities and SMT
Sergio Jimenez and
Claudia Becerra
(a participating system in the Cross-lingual
Textual Entailment, CLTE, TASK-8)
Alexander Gelbukh
Instituto Politécnico
Nacional, Mexico
Soft Cardinality
A=
, ,
B=
, ,
|A|=3
|B|=3
Classical
(integer)
Soft
(real)
Cardinality: number of different elements in a
collection, i.e. set definition.
C=
,
= |C|=1 |C|’=1.0
Soft Cardinality
inter-elements
similarity
elements
weights
“softness”
control
word-to-word
similarity
idf term
weighting
Word-to-word similarity functions
• Character q-grams
• Edit-distance
• Jaro-Winkler
m is the number of order mismatches between the common characters
Features for Text Pairs T1 , T2
T1
T2
Language-pair Model
T1
(EN)
T2
(FR)
SMT
SMT
T1
t
(FR)
T2
t
(EN)
translate
TextPre-processing
• Tokenizing
• Stemming
• Stop-words
removal
• idf term
weighting FeatureExtraction
4-wayclassificationmodel
Gold
standard
SVM
Submitted Systems
• RUN1: 4 language-pair models (es-
en, fr-en, it-en, de-en) each one
trained with 1,000 text pairs. SVM
using C=1.0
• RUN2: same as RUN1 but optimizing
C for max. accuracy.
Official Results
Circular Pivoting Translations
T1
(EN)
T2
(FR)
SMT
SMT
T1
t
(FR)
T2
t
(EN)
SMT
SMT
T1
tt
(EN)
T2
tt
(FR)
SMT
SMT
T1
ttt
(FR)
T2
ttt
(EN)
Original feature set: 2 comparable text pairs x14 features= 28 features
Extended feature set: 2+2+4 comparable text pairs x14 features= 112 features
Original feature set: 2+2 comparable text pairs x14 features= 56 features
Single Multilingual Model
1,000 feature vectors
1,000 feature vectors
1,000 feature vectors
1,000 feature vectors
4,000
features
vector
training
data set
4-wayclassificationmodel
Single Multilingual Model Results
4.6%
better than
best official
5.3%
better than
best official 1.3%
better than
best official
4.4%
below best
official
6.0%
better than
best official
baseline
Conclusions
• Soft Cardinality + SMT + SVM seems to be a
good combination for CLTE.
• A single multilingual model produced
improved results than language-pair models.
• Additional circular pivoting translations
produced slightly improved but consistent
improvements.
• Character q-grams seems to be better than
Edit-distance and Jaro-Winkler.
Soft Cardinality at *SEM and SemEval
• STS-2012, official 3th out of 89 systems
• STS-2013-CORE task, 18th out of 90 systems
(4th un-official)
• STS-2013-TYPED task, top-system UNITOR team
• CLTE-2012, 3rd out of 29 systems (1st un-official)
• CLTE-2013, among the 2-top systems
• SRA-2013, among the 2-top systems
, ,
’

More Related Content

What's hot

[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
JEE HYUN PARK
 
Dli meetup-may 27-milano_nlp_de_nobili
Dli meetup-may 27-milano_nlp_de_nobiliDli meetup-may 27-milano_nlp_de_nobili
Dli meetup-may 27-milano_nlp_de_nobili
Deep Learning Italia
 
Bert
BertBert
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
Abdullah Khan Zehady
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimization
Attaporn Ninsuwan
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
shaurya uppal
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
JiWenKim
 
A Systematic Approach to Generate Diverse Instantiations for Conceptual Schemas
A Systematic Approach to Generate Diverse Instantiations for Conceptual SchemasA Systematic Approach to Generate Diverse Instantiations for Conceptual Schemas
A Systematic Approach to Generate Diverse Instantiations for Conceptual Schemas
Lola Burgueño
 
Object Oriented Technologies
Object Oriented TechnologiesObject Oriented Technologies
Object Oriented Technologies
Tushar B Kute
 
Hw6 interpreter iterator GoF
Hw6 interpreter iterator GoFHw6 interpreter iterator GoF
Hw6 interpreter iterator GoF
Edison Lascano
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
Jayavardhan Reddy Peddamail
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
Hanwha System / ICT
 

What's hot (12)

[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
Dli meetup-may 27-milano_nlp_de_nobili
Dli meetup-may 27-milano_nlp_de_nobiliDli meetup-may 27-milano_nlp_de_nobili
Dli meetup-may 27-milano_nlp_de_nobili
 
Bert
BertBert
Bert
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Simple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimizationSimple effective decipherment via combinatorial optimization
Simple effective decipherment via combinatorial optimization
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
A Systematic Approach to Generate Diverse Instantiations for Conceptual Schemas
A Systematic Approach to Generate Diverse Instantiations for Conceptual SchemasA Systematic Approach to Generate Diverse Instantiations for Conceptual Schemas
A Systematic Approach to Generate Diverse Instantiations for Conceptual Schemas
 
Object Oriented Technologies
Object Oriented TechnologiesObject Oriented Technologies
Object Oriented Technologies
 
Hw6 interpreter iterator GoF
Hw6 interpreter iterator GoFHw6 interpreter iterator GoF
Hw6 interpreter iterator GoF
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
BERT introduction
BERT introductionBERT introduction
BERT introduction
 

Similar to SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment from Cardinalities and SMT

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Nimrita Koul
 
Speech recognition using vector quantization through modified k means lbg alg...
Speech recognition using vector quantization through modified k means lbg alg...Speech recognition using vector quantization through modified k means lbg alg...
Speech recognition using vector quantization through modified k means lbg alg...
Alexander Decker
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
Danushka Bollegala
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
Dev Nath
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
andrefsantos
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERING
George Simov
 

Similar to SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment from Cardinalities and SMT (6)

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Speech recognition using vector quantization through modified k means lbg alg...
Speech recognition using vector quantization through modified k means lbg alg...Speech recognition using vector quantization through modified k means lbg alg...
Speech recognition using vector quantization through modified k means lbg alg...
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
 
A survey on parallel corpora alignment
A survey on parallel corpora alignment A survey on parallel corpora alignment
A survey on parallel corpora alignment
 
ECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERINGECO_TEXT_CLUSTERING
ECO_TEXT_CLUSTERING
 

More from Sergio Jimenez

Text Comparison Using Soft Cardinality
Text Comparison Using Soft CardinalityText Comparison Using Soft Cardinality
Text Comparison Using Soft Cardinality
Sergio Jimenez
 
SC spectra: A new soft cardinality approximation for text comparison
SC spectra: A new soft cardinality approximation for text comparisonSC spectra: A new soft cardinality approximation for text comparison
SC spectra: A new soft cardinality approximation for text comparison
Sergio Jimenez
 
Soft Cardinality: A Parameterized Similarity Function for Text Comparison
Soft Cardinality: A Parameterized Similarity Function for Text ComparisonSoft Cardinality: A Parameterized Similarity Function for Text Comparison
Soft Cardinality: A Parameterized Similarity Function for Text Comparison
Sergio Jimenez
 
SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for...
SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for...SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for...
SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for...
Sergio Jimenez
 
UNAL: Discriminating between Literal and Figurative Phrasal Usage Using Distr...
UNAL: Discriminating between Literal and Figurative Phrasal Usage Using Distr...UNAL: Discriminating between Literal and Figurative Phrasal Usage Using Distr...
UNAL: Discriminating between Literal and Figurative Phrasal Usage Using Distr...
Sergio Jimenez
 
SOFTCARDINALITY: Hierarchical Text Overlap for Student Response Analysis
SOFTCARDINALITY: Hierarchical Text Overlap for Student Response AnalysisSOFTCARDINALITY: Hierarchical Text Overlap for Student Response Analysis
SOFTCARDINALITY: Hierarchical Text Overlap for Student Response Analysis
Sergio Jimenez
 

More from Sergio Jimenez (6)

Text Comparison Using Soft Cardinality
Text Comparison Using Soft CardinalityText Comparison Using Soft Cardinality
Text Comparison Using Soft Cardinality
 
SC spectra: A new soft cardinality approximation for text comparison
SC spectra: A new soft cardinality approximation for text comparisonSC spectra: A new soft cardinality approximation for text comparison
SC spectra: A new soft cardinality approximation for text comparison
 
Soft Cardinality: A Parameterized Similarity Function for Text Comparison
Soft Cardinality: A Parameterized Similarity Function for Text ComparisonSoft Cardinality: A Parameterized Similarity Function for Text Comparison
Soft Cardinality: A Parameterized Similarity Function for Text Comparison
 
SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for...
SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for...SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for...
SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for...
 
UNAL: Discriminating between Literal and Figurative Phrasal Usage Using Distr...
UNAL: Discriminating between Literal and Figurative Phrasal Usage Using Distr...UNAL: Discriminating between Literal and Figurative Phrasal Usage Using Distr...
UNAL: Discriminating between Literal and Figurative Phrasal Usage Using Distr...
 
SOFTCARDINALITY: Hierarchical Text Overlap for Student Response Analysis
SOFTCARDINALITY: Hierarchical Text Overlap for Student Response AnalysisSOFTCARDINALITY: Hierarchical Text Overlap for Student Response Analysis
SOFTCARDINALITY: Hierarchical Text Overlap for Student Response Analysis
 

Recently uploaded

Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
UiPathCommunity
 

Recently uploaded (20)

Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Day 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio FundamentalsDay 2 - Intro to UiPath Studio Fundamentals
Day 2 - Intro to UiPath Studio Fundamentals
 

SOFTCARDINALITY: Learning to Identify Directional Cross-Lingual Entailment from Cardinalities and SMT