SlideShare a Scribd company logo
1 of 23
Download to read offline
AUTOMATIC IDENTIFICATION OF PROVERB VARIANTS:
AN EXPERIMENT WITH BRAZILIAN PORTUGUESE
Amanda Rassi
Jorge Baptista
Oto Vale
PROPOR 2014 - International Conference
on Computational Processing of Portuguese
October 6-9, 2014 USP-São Carlos, SP, Brazil
Proverbs
• Definition
• a type of multiword expressions (micro-texts)
• special citation status
• express atemporal truths
• combinatorial and lexical constraints
• sentences syntactically identical to ordinary sentences
• common lexicon
!
• Delimitation
• Proverbs ≠ frozen sentences (or idioms)
• Proverbs: subject position necessarily filled 

by a fixed element vs. Idioms: subject position
distributionally free (in most cases)
2
Goals
• Automatically detect proverbs in texts, even when they are
not introduced by any linguistic “quoting” devices:
!
Como dizem ‘as they say’
Como dizia minha avó ‘as my grandmother used to say’
Dizem por aí ‘people say/they say’
Costuma dizer-se ‘it is often said’
!
• Identify the variants of proverbs, 

considering both formal and lexical variations.
3
Related work
• For French and Italian: 

Conenna (1998, 2000, 2004) and Lacavalla (2007)
!
• For French and Spanish: 

Brotons (2008)
!
• For European Portuguese (EP): 

Chacoto (2006, 2007, 2008)
!
• For Brazilian Portuguese (BP): 

no formal description
4
Motivation
• though relatively rare, proverbs are “islands of
meaning” in texts (citation status)
• often difficult to spot,
• lack formal marks
• formal and lexical variation
• often enter in wordplay
• discursive function is complex (entailment)
• relation with other textual elements disturbs 

(no coreference)
5
Methods
• create a database with proverbs;
• define syntactic criteria to organize the collected
proverbs into similar formal classes;
• organize the elements according to POS;
• produce tables of core elements;
!
with Unitex 3.1 (Paumier 2003, 2014)
• create reference graphs with the basic syntactic
structures for each class;
• intersect the graphs with the tables of the proverbs’
core elements to produce finite-state transducers,
which can then be applied to texts.
6
Collection of proverbs
• 5 different sources:
• list of proverbs in Wikipedia
• grand book of proverbs (Teixeira, 1942)
• 1001 proverbs (Steinberg, 1985)
• book of proverbs (Pinto, 2003)
• dictionary of proverbs (Magalhães Jr., 1974)
!
• Original list of 3,502 proverbs (and their variants)
• Final list of 594 proverbs (types or base-forms)
7
Classification criteria
• number of verb phrases/clauses (P1, P2 and P3)
!
• in P1
• impersonal constructions
• the verb is a copula verb
• obligatory negation (Neg)
• obligatory fronting of PP verb complement
• in P2
• comparatives
• coordinate/subordinate clauses
• verbless coordinated phrases
• obligatory fronting of 2nd verb phrase
• in P3 (no subclasses)
8
Formal classes
9
Proverbs that did not fit in any of the categories above were added in a
residual class. Table 1 shows the breakdown of the proverbs (base-forms) per
class.
Table 1. Formal Classification of Brazilian Portuguese Proverbs
Class Structure Example (approximate translation) Count
P1F1 Ø V w N˜ao h´a parto sem dor 20
(impersonal) ‘There is no painless childbirth’
P1F2 N0 V cop Adj/N w O silˆencio ´e de ouro 53
‘Silence is golden’
P1F3 N0 V w Uma m˜ao lava a outra 80
‘One hand washes the other’
P1F4 N0 Neg V w C˜ao que ladra n˜ao morde 53
‘A barking dog seldom bites’
P1F5 Prep Ni N0 V w Em terra de cego, quem tem um olho ´e rei 45
‘In the land of the blind, the one-eyed is king’
P2F1 F1 Conjs-comp F2 Antes s´o que mal acompanhado 39
(comparatives) ‘Better alone than in bad company’
P2F2 F1 Conjc F2 Aqui se faz e aqui se paga 71
(coordinated) ‘What goes around comes around’
P2F3 NP1, NP2 Cada cabe¸ca, uma senten¸ca 48
‘Each head its sentence’
P2F4 Qu- F1 F2 Quem ri por ´ultimo ri melhor 90
(subordinated) ‘Who laughs last laughs best’
P2F5 F1 Conjs F2 Pense duas vezes antes de agir 20
(subordinated) ‘Look before you leap’
P2F6 Conjs F2, F1 Quando o gato sai de casa, os ratos fazem festa 28
(fronted subord.) ‘When the cat’s away, the mice will play’
P3 F1, F2, F3 M˜aos frias, cora¸c˜ao quente, amor ardente 24
‘Cold hands, warm heart, burning love’
Residual not specified Comer e co¸car ´e s´o come¸car 43
‘To keep eating and scratching, just start’
Total 614
Core elements
!
• Noun phrases (NP), subject (N0) or complement (N1):
• noun (N) or pronoun (PRO)
• adjective (Adj)
• eventual determiners (Det) or modifiers (Mod)
!
• Verbal phrases (VP):
• main verb (V)
• eventual auxiliaries (Aux)
• adverbial modifiers (Mod)
10
Graphs and Transducers
!
Quem conta um conto aumenta um ponto
!
‘Who tells a tale adds a point’
!
!
!
!
Example of a reference graph for P2F4 class
!
!
!
!
!
Example of a FS transducer for proverb 0023 in P2F4 class
11
Concordance
12
[proverb

	
  ID
=core	
  elements]matched	
  
string
74.7
13
0 A. Rassi, J. Baptista and O. Vale
Table 3. Results of automatic identification of proverbs by class
Class Proverbs Matches Types True-Positives False-Positives
P1F1 20 15 4 13 2
P1F2 53 91 21 75 16
P1F3 80 153 24 98 55
P1F4 53 61 15 61 0
P1F5 45 63 5 57 6
P2F1 39 40 7 39 1
P2F2 71 14 3 5 9
P2F3 48 40 8 15 25
P2F4 90 56 37 30 26
P2F5 20 3 1 3 0
P2F6 28 1 1 1 0
P3 24 0 0 0 0
Residual 43 20 8 19 1
Total 614 557 134 416 141
Precision	
  =	
  74.7	
  %	
  
Error analysis
• Specific subsets in P2F4 class: 

Quem <MOT>* V <MOT>* V! Quem tem boca vai a Roma

Quem <V> <V> Quem cala consente
!
• Constraints on V tense: 

Quem(<V:P3s>+<V:J3s>+<V:W>)(<V:P3s>+<V:F3s>+<V:W>)
P2F4 class Matches FP Precision
(P2F4)
Precision
(all classes)
Quem <MOT>* V <MOT>* V 276 200 27.5% 60.15%
Quem V V (no insertions) 56 26 53.57% 73.55%
14
Discussion - New variants
• The matches found allowed us to identify other
variants of the same proverb that were not in the initial
list:
!
Antes tarde do que nunca
‘Better later than never’
15
new	
  variants
Discussion (cont.) - New proverbs
!
• It was also possible to find proverbs 

that were not in the previous list.
!
P2F4 class: quem V V ‘who V V’
!
Quem sabe faz
‘Who knows makes’
!
Quem sabe faz ao vivo
‘Who knows makes it viva’
16
Discussion (cont.) – Window insertion length
!
• The length of the insertion window can vary, depending
on the type of proverb involved (in general, at maximum
5 words).
!
!
O buraco [das negociações com o Congresso] é muito mais embaixo
‘the hole [in negotiations with Congress] is much more down’
!
a justiça [que o brasileiro tanto almeja] começa dentro de casa
‘the justice [that the Brazilian so much craves] begins at home’
17
Discussion (cont.) – Separators
!
• In Portuguese proverbs, the use of comma is not
systematic, and in many cases it can be considered
to be optional.
• The reference graphs allow the facultative presence
of punctuation between the core words.
!
Quem sai ao vento (,) perde o assento (comma facultative)
‘Who leaves to the the wind, loses the seat’
!
Quando a esmola é demais (,) o santo desconfia (comma
facultative)
‘When the alms are too much, the saint suspects’
18
Discussion (cont.) – Transformations
!
• Some proverbs of P1F2 class 

allow a mirror permutation
O ataque é a melhor defesa
[Mirror Permut.]= A melhor defesa é o ataque
‘The attack is the best defense = The best defense is the attack’
19
Discussion (cont.) - Negation
!
• The negation may not be considered an obligatory
element — wordplay often involves the removal of this
negation, to produce some type of effect:
!
Beleza não põe mesa
‘Beauty does not set the table’
!
Como a maioria das outras entrevistadas, 

Astrid diz que beleza põe mesa, sim
‘Like most other interviewees, 

Astrid says that beauty does set the table’
20
Discussion (cont.) - Implicit clauses
!
• Some proverbs in P2F2 class, formed by two
propositions, may result from coordinating two simple
proverbs with one proposition each:
!
Quem casa não pensa, quem pensa não casa

‘Who gets married doesn‘t think, who think doesn‘t get married’
!
Quem casa não pensa

‘Who gets married doesn‘t think’
!
Quem pensa não casa

‘Who think doesn‘t get married’
21
Synopsis
(1) the formal (syntactic) classification of proverbs in 13
classes: this classification may serve as a starting point
for deeper analysis on each one of these proverbial
structures;
(2) the identification of the core elements of each proverb:
the methodology presented to extract keywords can be
replicated for other corpora in order to check different text
types and domains;
(3) the definition of an adequate length for insertions’
window (words and punctuation), which may vary
depending on the class of proverbs
22
Thank you!
Questions, please! 23

More Related Content

Recently uploaded

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 

Recently uploaded (20)

LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 

Featured

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Featured (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Rassi et-al propor-2014

  • 1. AUTOMATIC IDENTIFICATION OF PROVERB VARIANTS: AN EXPERIMENT WITH BRAZILIAN PORTUGUESE Amanda Rassi Jorge Baptista Oto Vale PROPOR 2014 - International Conference on Computational Processing of Portuguese October 6-9, 2014 USP-São Carlos, SP, Brazil
  • 2. Proverbs • Definition • a type of multiword expressions (micro-texts) • special citation status • express atemporal truths • combinatorial and lexical constraints • sentences syntactically identical to ordinary sentences • common lexicon ! • Delimitation • Proverbs ≠ frozen sentences (or idioms) • Proverbs: subject position necessarily filled 
 by a fixed element vs. Idioms: subject position distributionally free (in most cases) 2
  • 3. Goals • Automatically detect proverbs in texts, even when they are not introduced by any linguistic “quoting” devices: ! Como dizem ‘as they say’ Como dizia minha avó ‘as my grandmother used to say’ Dizem por aí ‘people say/they say’ Costuma dizer-se ‘it is often said’ ! • Identify the variants of proverbs, 
 considering both formal and lexical variations. 3
  • 4. Related work • For French and Italian: 
 Conenna (1998, 2000, 2004) and Lacavalla (2007) ! • For French and Spanish: 
 Brotons (2008) ! • For European Portuguese (EP): 
 Chacoto (2006, 2007, 2008) ! • For Brazilian Portuguese (BP): 
 no formal description 4
  • 5. Motivation • though relatively rare, proverbs are “islands of meaning” in texts (citation status) • often difficult to spot, • lack formal marks • formal and lexical variation • often enter in wordplay • discursive function is complex (entailment) • relation with other textual elements disturbs 
 (no coreference) 5
  • 6. Methods • create a database with proverbs; • define syntactic criteria to organize the collected proverbs into similar formal classes; • organize the elements according to POS; • produce tables of core elements; ! with Unitex 3.1 (Paumier 2003, 2014) • create reference graphs with the basic syntactic structures for each class; • intersect the graphs with the tables of the proverbs’ core elements to produce finite-state transducers, which can then be applied to texts. 6
  • 7. Collection of proverbs • 5 different sources: • list of proverbs in Wikipedia • grand book of proverbs (Teixeira, 1942) • 1001 proverbs (Steinberg, 1985) • book of proverbs (Pinto, 2003) • dictionary of proverbs (Magalhães Jr., 1974) ! • Original list of 3,502 proverbs (and their variants) • Final list of 594 proverbs (types or base-forms) 7
  • 8. Classification criteria • number of verb phrases/clauses (P1, P2 and P3) ! • in P1 • impersonal constructions • the verb is a copula verb • obligatory negation (Neg) • obligatory fronting of PP verb complement • in P2 • comparatives • coordinate/subordinate clauses • verbless coordinated phrases • obligatory fronting of 2nd verb phrase • in P3 (no subclasses) 8
  • 9. Formal classes 9 Proverbs that did not fit in any of the categories above were added in a residual class. Table 1 shows the breakdown of the proverbs (base-forms) per class. Table 1. Formal Classification of Brazilian Portuguese Proverbs Class Structure Example (approximate translation) Count P1F1 Ø V w N˜ao h´a parto sem dor 20 (impersonal) ‘There is no painless childbirth’ P1F2 N0 V cop Adj/N w O silˆencio ´e de ouro 53 ‘Silence is golden’ P1F3 N0 V w Uma m˜ao lava a outra 80 ‘One hand washes the other’ P1F4 N0 Neg V w C˜ao que ladra n˜ao morde 53 ‘A barking dog seldom bites’ P1F5 Prep Ni N0 V w Em terra de cego, quem tem um olho ´e rei 45 ‘In the land of the blind, the one-eyed is king’ P2F1 F1 Conjs-comp F2 Antes s´o que mal acompanhado 39 (comparatives) ‘Better alone than in bad company’ P2F2 F1 Conjc F2 Aqui se faz e aqui se paga 71 (coordinated) ‘What goes around comes around’ P2F3 NP1, NP2 Cada cabe¸ca, uma senten¸ca 48 ‘Each head its sentence’ P2F4 Qu- F1 F2 Quem ri por ´ultimo ri melhor 90 (subordinated) ‘Who laughs last laughs best’ P2F5 F1 Conjs F2 Pense duas vezes antes de agir 20 (subordinated) ‘Look before you leap’ P2F6 Conjs F2, F1 Quando o gato sai de casa, os ratos fazem festa 28 (fronted subord.) ‘When the cat’s away, the mice will play’ P3 F1, F2, F3 M˜aos frias, cora¸c˜ao quente, amor ardente 24 ‘Cold hands, warm heart, burning love’ Residual not specified Comer e co¸car ´e s´o come¸car 43 ‘To keep eating and scratching, just start’ Total 614
  • 10. Core elements ! • Noun phrases (NP), subject (N0) or complement (N1): • noun (N) or pronoun (PRO) • adjective (Adj) • eventual determiners (Det) or modifiers (Mod) ! • Verbal phrases (VP): • main verb (V) • eventual auxiliaries (Aux) • adverbial modifiers (Mod) 10
  • 11. Graphs and Transducers ! Quem conta um conto aumenta um ponto ! ‘Who tells a tale adds a point’ ! ! ! ! Example of a reference graph for P2F4 class ! ! ! ! ! Example of a FS transducer for proverb 0023 in P2F4 class 11
  • 13. 74.7 13 0 A. Rassi, J. Baptista and O. Vale Table 3. Results of automatic identification of proverbs by class Class Proverbs Matches Types True-Positives False-Positives P1F1 20 15 4 13 2 P1F2 53 91 21 75 16 P1F3 80 153 24 98 55 P1F4 53 61 15 61 0 P1F5 45 63 5 57 6 P2F1 39 40 7 39 1 P2F2 71 14 3 5 9 P2F3 48 40 8 15 25 P2F4 90 56 37 30 26 P2F5 20 3 1 3 0 P2F6 28 1 1 1 0 P3 24 0 0 0 0 Residual 43 20 8 19 1 Total 614 557 134 416 141 Precision  =  74.7  %  
  • 14. Error analysis • Specific subsets in P2F4 class: 
 Quem <MOT>* V <MOT>* V! Quem tem boca vai a Roma
 Quem <V> <V> Quem cala consente ! • Constraints on V tense: 
 Quem(<V:P3s>+<V:J3s>+<V:W>)(<V:P3s>+<V:F3s>+<V:W>) P2F4 class Matches FP Precision (P2F4) Precision (all classes) Quem <MOT>* V <MOT>* V 276 200 27.5% 60.15% Quem V V (no insertions) 56 26 53.57% 73.55% 14
  • 15. Discussion - New variants • The matches found allowed us to identify other variants of the same proverb that were not in the initial list: ! Antes tarde do que nunca ‘Better later than never’ 15 new  variants
  • 16. Discussion (cont.) - New proverbs ! • It was also possible to find proverbs 
 that were not in the previous list. ! P2F4 class: quem V V ‘who V V’ ! Quem sabe faz ‘Who knows makes’ ! Quem sabe faz ao vivo ‘Who knows makes it viva’ 16
  • 17. Discussion (cont.) – Window insertion length ! • The length of the insertion window can vary, depending on the type of proverb involved (in general, at maximum 5 words). ! ! O buraco [das negociações com o Congresso] é muito mais embaixo ‘the hole [in negotiations with Congress] is much more down’ ! a justiça [que o brasileiro tanto almeja] começa dentro de casa ‘the justice [that the Brazilian so much craves] begins at home’ 17
  • 18. Discussion (cont.) – Separators ! • In Portuguese proverbs, the use of comma is not systematic, and in many cases it can be considered to be optional. • The reference graphs allow the facultative presence of punctuation between the core words. ! Quem sai ao vento (,) perde o assento (comma facultative) ‘Who leaves to the the wind, loses the seat’ ! Quando a esmola é demais (,) o santo desconfia (comma facultative) ‘When the alms are too much, the saint suspects’ 18
  • 19. Discussion (cont.) – Transformations ! • Some proverbs of P1F2 class 
 allow a mirror permutation O ataque é a melhor defesa [Mirror Permut.]= A melhor defesa é o ataque ‘The attack is the best defense = The best defense is the attack’ 19
  • 20. Discussion (cont.) - Negation ! • The negation may not be considered an obligatory element — wordplay often involves the removal of this negation, to produce some type of effect: ! Beleza não põe mesa ‘Beauty does not set the table’ ! Como a maioria das outras entrevistadas, 
 Astrid diz que beleza põe mesa, sim ‘Like most other interviewees, 
 Astrid says that beauty does set the table’ 20
  • 21. Discussion (cont.) - Implicit clauses ! • Some proverbs in P2F2 class, formed by two propositions, may result from coordinating two simple proverbs with one proposition each: ! Quem casa não pensa, quem pensa não casa
 ‘Who gets married doesn‘t think, who think doesn‘t get married’ ! Quem casa não pensa
 ‘Who gets married doesn‘t think’ ! Quem pensa não casa
 ‘Who think doesn‘t get married’ 21
  • 22. Synopsis (1) the formal (syntactic) classification of proverbs in 13 classes: this classification may serve as a starting point for deeper analysis on each one of these proverbial structures; (2) the identification of the core elements of each proverb: the methodology presented to extract keywords can be replicated for other corpora in order to check different text types and domains; (3) the definition of an adequate length for insertions’ window (words and punctuation), which may vary depending on the class of proverbs 22