SlideShare a Scribd company logo
Communicative verbs and
constructions as markers of
     text Media genre


  Lidia Pivovarova, Elena Yagunova
      Saint-Petersburg State University
            Edward Klyshinsky
 Keldysh IAM of Russia Academy of Science
Introduction
• Ongoing research of different text genres:
  – Media texts, scientific papers, fiction…
• Our approach:
  – Different genres means different corpora -
    special corpus for each genre
  – Formal markers of a corpus are keywords,
    collocations, constructions, POS co-
    occurrences – found by different statistics
Media texts
• Different (sub)genres, subtypes:
  – news, articles, analytical review, etc.
• Sources are heterogeneous:
  – newspaper, news feed, etc.

• Trade-off between between informative
  and persuasive functions of language
• Communicative verbs and constructions
  may help to study this trade-off
Constructions in media texts
• Media text are stereotyped
  – Especially texts with high informative function
• Long word sequences (up to 7 words) may
  be repeated in every text of a corpus
• These word sequences are mostly
  constructions
• The most frequent constructions are
  introduction of information source:
  – According to RIA News…, Our reporter says
    with reference to…
Communicative verbs and
          constructions
• Reuters reported with reference to Ministry of
  Internal Affair …
                   vs.
• Ministry of Internal Affair informs citizens...



• The Prime Minister stated that…
                 vs.
• Our president said that…
Materials
• Lenta.ru
    – a news feed with short information messages;
    – 2005-2010, ~60 mln tokens (words and punctuation mark)
• Nezavisimaya gazeta
    – a traditional newspaper with articles, interviews and analytics;
    – 1999-2009, ~60 mln tokens
•   Compulenta
    – a news portal specializing in IT matters;
    – 2009, ~1,5 mln tokens
• Russian Information Agency
    – one of the most authoritative news sources in Russia;
    – 2009-2010, ~60 mln tokens
• RosBusinessConsulting news
    – a news portal specializing in economical and political information
      for business.
    – 2009, ~20 mln. Tokens
• etc.
Materials
• style and format opposition:
  – traditional newspaper – news feed


• subject opposition:
  – general news – IT news – business news
Communicative verbs
• RussNet Project
  – Russian WordNet project, Saint-Petersburg
    State University, group of I.V. Azarova
• Only a list of verbs (instead of their full
  descriptions)
• Broke up into three groups:
  communicative, mental and imagination
  verbs
  – Rather contradictional: He thinks… often
    means He said…
Specificity of news (newspaper)
                 texts
• Word frequencies are different from
  frequencies in language at the whole

• COOБЩАТЬ (to inform, to report) –
  more frequent
• ПИСАТЬ (to write), ГОВОРИТЬ (to say)
  – less frequent
Lenta.ru: most frequent verbs
БЫТЬ         72993   to be
СООБЩАТЬ     27498   to report, to inform
СТАТЬ        12614   to become
ЗАЯВИТЬ      10861   to announce
МОЧЬ         10341   to be able
ПОЛУЧАТЬ      7924   to receive
ЯВЛЯТЬСЯ      7590   to be, to come
СООБЩАТЬСЯ    6670   to report (reflexive voice)
ОТМЕТИТЬ      5806   to mark, to mention
ПИСАТЬ        5462   to write
НАХОДИТЬ      5221   to find, to consider
НАПОМНИТЬ     5209   to remind
НАХОДИТЬСЯ    5120   to be situated
НАЧАТЬ        4907   to begin
СООБЩИТЬ      4832   to report (perfective aspect)
Lenta.ru: RussNet communicative verbs
 СООБЩАТЬ       27498   to report
 ЗАЯВИТЬ        10861   to announce
 СООБЩАТЬСЯ      6670   to report (reflexive voice)
 ПИСАТЬ          5462   to write
 СООБЩИТЬ        4832   to report (perfective aspect)
 СКАЗАТЬ         2281   to say (perfective aspect)
 ГОВОРИТЬСЯ      1765   to say (reflexive voice)
 ПОДЧЕРКНУТЬ     1637   to underline
 ЗАЯВЛЯТЬ        1165   to announce (imperfective)
 ГОВОРИТЬ        1104   to say
 ПОЯСНИТЬ        880    to elucidate
 НАПИСАТЬ        841    to write (perfective)
 ВЫЯСНИТЬСЯ      803    to find out (reflexive voice)
 ВЫЯСНИТЬ        662    to find out
 ПОДЧЕРКИВАТЬ    646    to underline
 ЗАПИСАТЬ        248    to write down
RosBusinessConsulting
БЫТЬ          270391   to be
НАПОМНИТЬ     65126    to remind
ОТМЕТИТЬ      47770    to mark, to mention
МОЧЬ          46877    to be able
СООБЩИТЬ      41867    to report (perfective aspect)
СОСТАВИТЬ     40057    to make, to form
СТАТЬ         35792    to become
ЗАЯВИТЬ       34767    to announce
СООБЩАТЬ      26744    to report, to inform
ЯВЛЯТЬСЯ      24216    to be, to come
СЧИТАТЬ       21808    to consider
ПЕРЕДАВАТЬ    20676    to broadcast
НАЧАТЬ        19805    to begin
НАХОДИТЬ      19309    to find, to consider
ПОЛУЧИТЬ      19284    to receive
СКАЗАТЬ       19123    to say (perfective aspect)
ВЫРАСТИ       19034    to increase
НАХОДИТЬСЯ    18996    to be situated
ПРОИЗОЙТИ     17505    to happen
ПОДЧЕРКНУТЬ   15129    to underline
RosBusinessConsulting
СООБЩИТЬ       41867   to report (perfective aspect)
ЗАЯВИТЬ        34747   to announce
СООБЩАТЬ       26744   to report
СКАЗАТЬ        19123   to say (perfective aspect)
ПОДЧЕРКНУТЬ    15129   to underline
ГОВОРИТЬ       10788   to say
ГОВОРИТЬСЯ     8314    to say (reflexive voice)
СООБЩАТЬСЯ     7873    to report (reflexive voice)
ЗАЯВЛЯТЬ       5092    to announce (imperfect)
ПОЯСНИТЬ       3487    to elucidate
ВЫСКАЗАТЬ      3167    to state, to express
ПОДЧЕРКИВАТЬ   2157    to underline (imperfect)
ПИСАТЬ         1966    to write
ВЫЯСНИТЬСЯ     1657    to find out (reflexive voice)
ВЫСКАЗАТЬСЯ    1390    to state, to express (reflexive)
ВЫЯСНИТЬ       1189    to find out
ВЫЯСНЯТЬСЯ      622    to find out (reflexive, imperfect)
НАПИСАТЬ        565    to write (perfective aspect)
Nezavisimaya gazeta
БЫТЬ        1550085   to be
МОЧЬ         239754   to be able
СТАТЬ        157307   to become
СКАЗАТЬ      101534   to say (perfective aspect)
ГОВОРИТЬ      92014   to say
СЧИТАТЬ       86323   to consider
ИМЕТЬ         75300   to have
ЗАЯВИТЬ       69021   to announce
ЯВЛЯТЬСЯ      68759   to be, to come
НАЧАТЬ        62696   to begin
ИДТИ          61977   to go, to go on
ПОЛУЧИТЬ      54966   to receive
ОКАЗАТЬСЯ     49748   to turn out, to find out
ЗНАТЬ         49413   to know
ХОТЕТЬ        48822   to want
НАХОДИТЬ      46363   to find, to consider
СДЕЛАТЬ       45003   to do
РАБОТАТЬ      44392   to work
СООБЩИТЬ      42477   to report, to inform
Nezavisimaya gazeta
СКАЗАТЬ       101534 to say (perfective aspect)
ГОВОРИТЬ       92014 to say
ЗАЯВИТЬ        69021 to announce
СООБЩИТЬ       42447 to report (perfective aspect)
ПИСАТЬ         31785 to write
СООБЩАТЬ       14988 to report
ГОВОРИТЬСЯ     13116 to say (reflexive voice)
НАПИСАТЬ       11315 to write (perfective)
ЗАЯВЛЯТЬ        9763 to announce
ПОЯСНИТЬ        8790 to elucidate
ВЫЯСНИТЬСЯ      6325 to find out (reflexive voice)
ВЫСКАЗАТЬ       4711 to state, to express
ВЫСКАЗАТЬСЯ     4604 to state, to express (reflexive)
СООБЩАТЬСЯ      3895 to report (reflexive)
ПОВТОРЯТЬ       3190 to repeat
ПОВТОРИТЬ       2704 to repeat (perfective)
ВЫСКАЗЫВАТЬСЯ   2612 to express (reflexive, imperfect)
ВЫСКАЗЫВАТЬ     2194 to express (imperfective)
Discussion
• “Nezavisimaya gazeta” is quite different
  from other sources
• Communicative verbs help to identify the
  style and format opposition (traditional
  newspaper – news feed) but not the
  subject oppositions (all news – IT news –
  business news).
Combinatorial databases
• A complete set of words combination extracted
  from different corpora – separate database for
  each corpus
• Combinations are sorted by grammar patterns:
  V+N, V+P+N, etc.
• It is possible fix V and find out all the
  combinations included it
• We use communication verbs to create a
  corpora “passport”
  – not complete description, but important information of
    style and genre (not subject)
RosBusinessConsulting: V+P+N
ГОВОРИТЬСЯ В СООБЩЕНИЕ          6509   message says
СООБЩИТЬ В ПРЕССЛУЖБА           5665   according to the press service
ГОВОРИТЬСЯ В ЗАЯВЛЕНИЕ          4493   announce says
СООБЩИТЬ В УПРАВЛЕНИЕ           1865   according to the administration
ЗАЯВИТЬ О НАМЕРЕНИЕ             1337   to declare the intention
ОБЪЯВИТЬ О НАМЕРЕНИЕ            1053   to announce the intention
СООБЩИТЬ НА ПРЕСС-КОНФЕРЕНЦИЯ   985    announce at a press conference
ЗАЯВИТЬ В ИНТЕРВЬЮ              937    say in an interview
ЗАЯВИТЬ О ГОТОВНОСТЬ            913    declare determination
ГОВОРИТЬСЯ В ДОКЛАД             789    report says
ГОВОРИТЬСЯ В МАТЕРИАЛ           745    material says
ЗАЯВИТЬ О НЕОБХОДИМОСТЬ         733    declare necessity
ГОВОРИТЬСЯ В ОТЧЕТ              693    report says
ВЫСКАЗАТЬ В ЭФИР                693    to state on the air
ГОВОРИТЬСЯ В ПРЕСС-РЕЛИЗ        661    press release says
ЗАЯВИТЬ В ЭФИР                  637    to declare on the air
RosBusinessConsulting: V+N
ВЫСКАЗАТЬ МНЕНИЕ         3913   express an opinion
ЗАЯВИТЬ ГЛАВА            3465   the head announced
ЗАЯВИТЬ ПРЕЗИДЕНТ        3233   the president announced
СООБЩИТЬ ПРЕДСТАВИТЕЛЬ   3069   the spokesman informed
ЗАЯВИТЬ ПРЕДСТАВИТЕЛЬ    2869   the spokesman announced
СООБЩИТЬ ПРЕСС-СЛУЖБА    2813   the press service informed
ЗАЯВИТЬ МИНИСТР          2313   the minister announced
СООБЩИТЬ ГЛАВА           2309   the head informed
СКАЗАТЬ ГЛАВА            2301   the head said
СКАЗАТЬ ПРЕЗИДЕНТ        1981   the president said
СКАЗАТЬ МИНИСТР          1861   the minister said
СООБЩИТЬ МИНИСТР         1785   the minister informed
ПОДЧЕРКНУТЬ ГЛАВА        1705   the head underlined
ПОДЧЕРКНУТЬ ПРЕЗИДЕНТ    1485   the president underlined
СООБЩИТЬ ИСТОЧНИК        1485   the source reported
СООБЩИТЬ ЗАМЕСТИТЕЛЬ     1321   the deputy reported
СКАЗАТЬ ПРЕДСТАВИТЕЛЬ    1101   the spokesman said
СООБЩИТЬ ПРЕЗИДЕНТ       993    the president
Conclusion
• It is possible to classify communicative verbs as
  (mostly) informative or (mostly) persuasive
• However, it is not our goal: we are focused on
  corpora, not words
• We try to find as many formal markers as it is
  possible
  - collocations, keywords, POS distribution…
• Communicative verbs are rather informative
  markers for Media text genres

More Related Content

Viewers also liked

Nina
NinaNina
Ninaeka
 
Estoesnieve
EstoesnieveEstoesnieve
Estoesnievenonnon
 
Ерехинская диктум извлечение мнений
Ерехинская диктум извлечение мненийЕрехинская диктум извлечение мнений
Ерехинская диктум извлечение мненийLidia Pivovarova
 
Cap3b Historia del cine
Cap3b Historia del cineCap3b Historia del cine
Cap3b Historia del cineXar Li
 
Aviaq P
Aviaq PAviaq P
Aviaq Peka
 
Edvard Peter
Edvard PeterEdvard Peter
Edvard Petereka
 
120619 cul knowledge based bus inno v03
120619 cul knowledge based bus inno v03120619 cul knowledge based bus inno v03
120619 cul knowledge based bus inno v03Michele Missikoff
 
Verben-ir.
Verben-ir.Verben-ir.
Verben-ir.
MsSchool
 
Charlotte
CharlotteCharlotte
Charlotteeka
 

Viewers also liked (12)

Kavir
KavirKavir
Kavir
 
Nina
NinaNina
Nina
 
Estoesnieve
EstoesnieveEstoesnieve
Estoesnieve
 
Ерехинская диктум извлечение мнений
Ерехинская диктум извлечение мненийЕрехинская диктум извлечение мнений
Ерехинская диктум извлечение мнений
 
Dialog
DialogDialog
Dialog
 
Crimen Horrendo
Crimen HorrendoCrimen Horrendo
Crimen Horrendo
 
Cap3b Historia del cine
Cap3b Historia del cineCap3b Historia del cine
Cap3b Historia del cine
 
Aviaq P
Aviaq PAviaq P
Aviaq P
 
Edvard Peter
Edvard PeterEdvard Peter
Edvard Peter
 
120619 cul knowledge based bus inno v03
120619 cul knowledge based bus inno v03120619 cul knowledge based bus inno v03
120619 cul knowledge based bus inno v03
 
Verben-ir.
Verben-ir.Verben-ir.
Verben-ir.
 
Charlotte
CharlotteCharlotte
Charlotte
 

More from Lidia Pivovarova

Classification and clustering in media monitoring: from knowledge engineering...
Classification and clustering in media monitoring: from knowledge engineering...Classification and clustering in media monitoring: from knowledge engineering...
Classification and clustering in media monitoring: from knowledge engineering...
Lidia Pivovarova
 
Convolutional neural networks for text classification
Convolutional neural networks for text classificationConvolutional neural networks for text classification
Convolutional neural networks for text classification
Lidia Pivovarova
 
Grouping business news stories based on salience of named entities
Grouping business news stories based on salience of named entitiesGrouping business news stories based on salience of named entities
Grouping business news stories based on salience of named entities
Lidia Pivovarova
 
Интеллектуальный анализ текста
Интеллектуальный анализ текстаИнтеллектуальный анализ текста
Интеллектуальный анализ текста
Lidia Pivovarova
 
AINL 2016: Yagunova
AINL 2016: YagunovaAINL 2016: Yagunova
AINL 2016: Yagunova
Lidia Pivovarova
 
AINL 2016: Kuznetsova
AINL 2016: KuznetsovaAINL 2016: Kuznetsova
AINL 2016: Kuznetsova
Lidia Pivovarova
 
AINL 2016: Bodrunova, Blekanov, Maksimov
AINL 2016: Bodrunova, Blekanov, MaksimovAINL 2016: Bodrunova, Blekanov, Maksimov
AINL 2016: Bodrunova, Blekanov, Maksimov
Lidia Pivovarova
 
AINL 2016: Boldyreva
AINL 2016: BoldyrevaAINL 2016: Boldyreva
AINL 2016: Boldyreva
Lidia Pivovarova
 
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
Lidia Pivovarova
 
AINL 2016: Kozerenko
AINL 2016: Kozerenko AINL 2016: Kozerenko
AINL 2016: Kozerenko
Lidia Pivovarova
 
AINL 2016: Shavrina, Selegey
AINL 2016: Shavrina, SelegeyAINL 2016: Shavrina, Selegey
AINL 2016: Shavrina, Selegey
Lidia Pivovarova
 
AINL 2016: Khudobakhshov
AINL 2016: KhudobakhshovAINL 2016: Khudobakhshov
AINL 2016: Khudobakhshov
Lidia Pivovarova
 
AINL 2016: Proncheva
AINL 2016: PronchevaAINL 2016: Proncheva
AINL 2016: Proncheva
Lidia Pivovarova
 
AINL 2016:
AINL 2016: AINL 2016:
AINL 2016:
Lidia Pivovarova
 
AINL 2016: Bugaychenko
AINL 2016: BugaychenkoAINL 2016: Bugaychenko
AINL 2016: Bugaychenko
Lidia Pivovarova
 
AINL 2016: Grigorieva
AINL 2016: GrigorievaAINL 2016: Grigorieva
AINL 2016: Grigorieva
Lidia Pivovarova
 
AINL 2016: Muravyov
AINL 2016: MuravyovAINL 2016: Muravyov
AINL 2016: Muravyov
Lidia Pivovarova
 
AINL 2016: Just AI
AINL 2016: Just AIAINL 2016: Just AI
AINL 2016: Just AI
Lidia Pivovarova
 
AINL 2016: Moskvichev
AINL 2016: MoskvichevAINL 2016: Moskvichev
AINL 2016: Moskvichev
Lidia Pivovarova
 
AINL 2016: Goncharov
AINL 2016: GoncharovAINL 2016: Goncharov
AINL 2016: Goncharov
Lidia Pivovarova
 

More from Lidia Pivovarova (20)

Classification and clustering in media monitoring: from knowledge engineering...
Classification and clustering in media monitoring: from knowledge engineering...Classification and clustering in media monitoring: from knowledge engineering...
Classification and clustering in media monitoring: from knowledge engineering...
 
Convolutional neural networks for text classification
Convolutional neural networks for text classificationConvolutional neural networks for text classification
Convolutional neural networks for text classification
 
Grouping business news stories based on salience of named entities
Grouping business news stories based on salience of named entitiesGrouping business news stories based on salience of named entities
Grouping business news stories based on salience of named entities
 
Интеллектуальный анализ текста
Интеллектуальный анализ текстаИнтеллектуальный анализ текста
Интеллектуальный анализ текста
 
AINL 2016: Yagunova
AINL 2016: YagunovaAINL 2016: Yagunova
AINL 2016: Yagunova
 
AINL 2016: Kuznetsova
AINL 2016: KuznetsovaAINL 2016: Kuznetsova
AINL 2016: Kuznetsova
 
AINL 2016: Bodrunova, Blekanov, Maksimov
AINL 2016: Bodrunova, Blekanov, MaksimovAINL 2016: Bodrunova, Blekanov, Maksimov
AINL 2016: Bodrunova, Blekanov, Maksimov
 
AINL 2016: Boldyreva
AINL 2016: BoldyrevaAINL 2016: Boldyreva
AINL 2016: Boldyreva
 
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
AINL 2016: Rykov, Nagornyy, Koltsova, Natta, Kremenets, Manovich, Cerrone, Cr...
 
AINL 2016: Kozerenko
AINL 2016: Kozerenko AINL 2016: Kozerenko
AINL 2016: Kozerenko
 
AINL 2016: Shavrina, Selegey
AINL 2016: Shavrina, SelegeyAINL 2016: Shavrina, Selegey
AINL 2016: Shavrina, Selegey
 
AINL 2016: Khudobakhshov
AINL 2016: KhudobakhshovAINL 2016: Khudobakhshov
AINL 2016: Khudobakhshov
 
AINL 2016: Proncheva
AINL 2016: PronchevaAINL 2016: Proncheva
AINL 2016: Proncheva
 
AINL 2016:
AINL 2016: AINL 2016:
AINL 2016:
 
AINL 2016: Bugaychenko
AINL 2016: BugaychenkoAINL 2016: Bugaychenko
AINL 2016: Bugaychenko
 
AINL 2016: Grigorieva
AINL 2016: GrigorievaAINL 2016: Grigorieva
AINL 2016: Grigorieva
 
AINL 2016: Muravyov
AINL 2016: MuravyovAINL 2016: Muravyov
AINL 2016: Muravyov
 
AINL 2016: Just AI
AINL 2016: Just AIAINL 2016: Just AI
AINL 2016: Just AI
 
AINL 2016: Moskvichev
AINL 2016: MoskvichevAINL 2016: Moskvichev
AINL 2016: Moskvichev
 
AINL 2016: Goncharov
AINL 2016: GoncharovAINL 2016: Goncharov
AINL 2016: Goncharov
 

Recently uploaded

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 

Communicative verbs and constructions as markers

  • 1. Communicative verbs and constructions as markers of text Media genre Lidia Pivovarova, Elena Yagunova Saint-Petersburg State University Edward Klyshinsky Keldysh IAM of Russia Academy of Science
  • 2. Introduction • Ongoing research of different text genres: – Media texts, scientific papers, fiction… • Our approach: – Different genres means different corpora - special corpus for each genre – Formal markers of a corpus are keywords, collocations, constructions, POS co- occurrences – found by different statistics
  • 3. Media texts • Different (sub)genres, subtypes: – news, articles, analytical review, etc. • Sources are heterogeneous: – newspaper, news feed, etc. • Trade-off between between informative and persuasive functions of language • Communicative verbs and constructions may help to study this trade-off
  • 4. Constructions in media texts • Media text are stereotyped – Especially texts with high informative function • Long word sequences (up to 7 words) may be repeated in every text of a corpus • These word sequences are mostly constructions • The most frequent constructions are introduction of information source: – According to RIA News…, Our reporter says with reference to…
  • 5. Communicative verbs and constructions • Reuters reported with reference to Ministry of Internal Affair … vs. • Ministry of Internal Affair informs citizens... • The Prime Minister stated that… vs. • Our president said that…
  • 6. Materials • Lenta.ru – a news feed with short information messages; – 2005-2010, ~60 mln tokens (words and punctuation mark) • Nezavisimaya gazeta – a traditional newspaper with articles, interviews and analytics; – 1999-2009, ~60 mln tokens • Compulenta – a news portal specializing in IT matters; – 2009, ~1,5 mln tokens • Russian Information Agency – one of the most authoritative news sources in Russia; – 2009-2010, ~60 mln tokens • RosBusinessConsulting news – a news portal specializing in economical and political information for business. – 2009, ~20 mln. Tokens • etc.
  • 7. Materials • style and format opposition: – traditional newspaper – news feed • subject opposition: – general news – IT news – business news
  • 8. Communicative verbs • RussNet Project – Russian WordNet project, Saint-Petersburg State University, group of I.V. Azarova • Only a list of verbs (instead of their full descriptions) • Broke up into three groups: communicative, mental and imagination verbs – Rather contradictional: He thinks… often means He said…
  • 9. Specificity of news (newspaper) texts • Word frequencies are different from frequencies in language at the whole • COOБЩАТЬ (to inform, to report) – more frequent • ПИСАТЬ (to write), ГОВОРИТЬ (to say) – less frequent
  • 10. Lenta.ru: most frequent verbs БЫТЬ 72993 to be СООБЩАТЬ 27498 to report, to inform СТАТЬ 12614 to become ЗАЯВИТЬ 10861 to announce МОЧЬ 10341 to be able ПОЛУЧАТЬ 7924 to receive ЯВЛЯТЬСЯ 7590 to be, to come СООБЩАТЬСЯ 6670 to report (reflexive voice) ОТМЕТИТЬ 5806 to mark, to mention ПИСАТЬ 5462 to write НАХОДИТЬ 5221 to find, to consider НАПОМНИТЬ 5209 to remind НАХОДИТЬСЯ 5120 to be situated НАЧАТЬ 4907 to begin СООБЩИТЬ 4832 to report (perfective aspect)
  • 11. Lenta.ru: RussNet communicative verbs СООБЩАТЬ 27498 to report ЗАЯВИТЬ 10861 to announce СООБЩАТЬСЯ 6670 to report (reflexive voice) ПИСАТЬ 5462 to write СООБЩИТЬ 4832 to report (perfective aspect) СКАЗАТЬ 2281 to say (perfective aspect) ГОВОРИТЬСЯ 1765 to say (reflexive voice) ПОДЧЕРКНУТЬ 1637 to underline ЗАЯВЛЯТЬ 1165 to announce (imperfective) ГОВОРИТЬ 1104 to say ПОЯСНИТЬ 880 to elucidate НАПИСАТЬ 841 to write (perfective) ВЫЯСНИТЬСЯ 803 to find out (reflexive voice) ВЫЯСНИТЬ 662 to find out ПОДЧЕРКИВАТЬ 646 to underline ЗАПИСАТЬ 248 to write down
  • 12. RosBusinessConsulting БЫТЬ 270391 to be НАПОМНИТЬ 65126 to remind ОТМЕТИТЬ 47770 to mark, to mention МОЧЬ 46877 to be able СООБЩИТЬ 41867 to report (perfective aspect) СОСТАВИТЬ 40057 to make, to form СТАТЬ 35792 to become ЗАЯВИТЬ 34767 to announce СООБЩАТЬ 26744 to report, to inform ЯВЛЯТЬСЯ 24216 to be, to come СЧИТАТЬ 21808 to consider ПЕРЕДАВАТЬ 20676 to broadcast НАЧАТЬ 19805 to begin НАХОДИТЬ 19309 to find, to consider ПОЛУЧИТЬ 19284 to receive СКАЗАТЬ 19123 to say (perfective aspect) ВЫРАСТИ 19034 to increase НАХОДИТЬСЯ 18996 to be situated ПРОИЗОЙТИ 17505 to happen ПОДЧЕРКНУТЬ 15129 to underline
  • 13. RosBusinessConsulting СООБЩИТЬ 41867 to report (perfective aspect) ЗАЯВИТЬ 34747 to announce СООБЩАТЬ 26744 to report СКАЗАТЬ 19123 to say (perfective aspect) ПОДЧЕРКНУТЬ 15129 to underline ГОВОРИТЬ 10788 to say ГОВОРИТЬСЯ 8314 to say (reflexive voice) СООБЩАТЬСЯ 7873 to report (reflexive voice) ЗАЯВЛЯТЬ 5092 to announce (imperfect) ПОЯСНИТЬ 3487 to elucidate ВЫСКАЗАТЬ 3167 to state, to express ПОДЧЕРКИВАТЬ 2157 to underline (imperfect) ПИСАТЬ 1966 to write ВЫЯСНИТЬСЯ 1657 to find out (reflexive voice) ВЫСКАЗАТЬСЯ 1390 to state, to express (reflexive) ВЫЯСНИТЬ 1189 to find out ВЫЯСНЯТЬСЯ 622 to find out (reflexive, imperfect) НАПИСАТЬ 565 to write (perfective aspect)
  • 14. Nezavisimaya gazeta БЫТЬ 1550085 to be МОЧЬ 239754 to be able СТАТЬ 157307 to become СКАЗАТЬ 101534 to say (perfective aspect) ГОВОРИТЬ 92014 to say СЧИТАТЬ 86323 to consider ИМЕТЬ 75300 to have ЗАЯВИТЬ 69021 to announce ЯВЛЯТЬСЯ 68759 to be, to come НАЧАТЬ 62696 to begin ИДТИ 61977 to go, to go on ПОЛУЧИТЬ 54966 to receive ОКАЗАТЬСЯ 49748 to turn out, to find out ЗНАТЬ 49413 to know ХОТЕТЬ 48822 to want НАХОДИТЬ 46363 to find, to consider СДЕЛАТЬ 45003 to do РАБОТАТЬ 44392 to work СООБЩИТЬ 42477 to report, to inform
  • 15. Nezavisimaya gazeta СКАЗАТЬ 101534 to say (perfective aspect) ГОВОРИТЬ 92014 to say ЗАЯВИТЬ 69021 to announce СООБЩИТЬ 42447 to report (perfective aspect) ПИСАТЬ 31785 to write СООБЩАТЬ 14988 to report ГОВОРИТЬСЯ 13116 to say (reflexive voice) НАПИСАТЬ 11315 to write (perfective) ЗАЯВЛЯТЬ 9763 to announce ПОЯСНИТЬ 8790 to elucidate ВЫЯСНИТЬСЯ 6325 to find out (reflexive voice) ВЫСКАЗАТЬ 4711 to state, to express ВЫСКАЗАТЬСЯ 4604 to state, to express (reflexive) СООБЩАТЬСЯ 3895 to report (reflexive) ПОВТОРЯТЬ 3190 to repeat ПОВТОРИТЬ 2704 to repeat (perfective) ВЫСКАЗЫВАТЬСЯ 2612 to express (reflexive, imperfect) ВЫСКАЗЫВАТЬ 2194 to express (imperfective)
  • 16. Discussion • “Nezavisimaya gazeta” is quite different from other sources • Communicative verbs help to identify the style and format opposition (traditional newspaper – news feed) but not the subject oppositions (all news – IT news – business news).
  • 17. Combinatorial databases • A complete set of words combination extracted from different corpora – separate database for each corpus • Combinations are sorted by grammar patterns: V+N, V+P+N, etc. • It is possible fix V and find out all the combinations included it • We use communication verbs to create a corpora “passport” – not complete description, but important information of style and genre (not subject)
  • 18. RosBusinessConsulting: V+P+N ГОВОРИТЬСЯ В СООБЩЕНИЕ 6509 message says СООБЩИТЬ В ПРЕССЛУЖБА 5665 according to the press service ГОВОРИТЬСЯ В ЗАЯВЛЕНИЕ 4493 announce says СООБЩИТЬ В УПРАВЛЕНИЕ 1865 according to the administration ЗАЯВИТЬ О НАМЕРЕНИЕ 1337 to declare the intention ОБЪЯВИТЬ О НАМЕРЕНИЕ 1053 to announce the intention СООБЩИТЬ НА ПРЕСС-КОНФЕРЕНЦИЯ 985 announce at a press conference ЗАЯВИТЬ В ИНТЕРВЬЮ 937 say in an interview ЗАЯВИТЬ О ГОТОВНОСТЬ 913 declare determination ГОВОРИТЬСЯ В ДОКЛАД 789 report says ГОВОРИТЬСЯ В МАТЕРИАЛ 745 material says ЗАЯВИТЬ О НЕОБХОДИМОСТЬ 733 declare necessity ГОВОРИТЬСЯ В ОТЧЕТ 693 report says ВЫСКАЗАТЬ В ЭФИР 693 to state on the air ГОВОРИТЬСЯ В ПРЕСС-РЕЛИЗ 661 press release says ЗАЯВИТЬ В ЭФИР 637 to declare on the air
  • 19. RosBusinessConsulting: V+N ВЫСКАЗАТЬ МНЕНИЕ 3913 express an opinion ЗАЯВИТЬ ГЛАВА 3465 the head announced ЗАЯВИТЬ ПРЕЗИДЕНТ 3233 the president announced СООБЩИТЬ ПРЕДСТАВИТЕЛЬ 3069 the spokesman informed ЗАЯВИТЬ ПРЕДСТАВИТЕЛЬ 2869 the spokesman announced СООБЩИТЬ ПРЕСС-СЛУЖБА 2813 the press service informed ЗАЯВИТЬ МИНИСТР 2313 the minister announced СООБЩИТЬ ГЛАВА 2309 the head informed СКАЗАТЬ ГЛАВА 2301 the head said СКАЗАТЬ ПРЕЗИДЕНТ 1981 the president said СКАЗАТЬ МИНИСТР 1861 the minister said СООБЩИТЬ МИНИСТР 1785 the minister informed ПОДЧЕРКНУТЬ ГЛАВА 1705 the head underlined ПОДЧЕРКНУТЬ ПРЕЗИДЕНТ 1485 the president underlined СООБЩИТЬ ИСТОЧНИК 1485 the source reported СООБЩИТЬ ЗАМЕСТИТЕЛЬ 1321 the deputy reported СКАЗАТЬ ПРЕДСТАВИТЕЛЬ 1101 the spokesman said СООБЩИТЬ ПРЕЗИДЕНТ 993 the president
  • 20. Conclusion • It is possible to classify communicative verbs as (mostly) informative or (mostly) persuasive • However, it is not our goal: we are focused on corpora, not words • We try to find as many formal markers as it is possible - collocations, keywords, POS distribution… • Communicative verbs are rather informative markers for Media text genres