• Like
  • Save
Mona Diab: Computational Modeling of Sociopragmatic Language Use in Arabic and English Social Media
Upcoming SlideShare
Loading in...5
×

Mona Diab: Computational Modeling of Sociopragmatic Language Use in Arabic and English Social Media

  • 81 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
81
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Computational Modeling of Sociopragmatic Language Use in Arabic and English Social Media   Mona Diab The George Washington University
  • 2. Acknowledgement   •  Joint  work  on  subgroup  detec5on  with  Dragomir   Radev,  Amjad  Abu  Jbara     •  My  students:  Muhammad  AbdulMageed,   Pradeep  Dasigi,  Weiwei  Guo   •  Collabora5ve  work  with  Owen  Rambow  and   Kathy  Mckeown,  and  their  respec5ve  groups   •  Collabora5ve  sociolinguis5c  observa5ons  with   Mustafa  Mughazy   •  Work  funded  by  IARPA  SCIL  program   •  Several  slides  adapted  from  several  presenta5ons   where  papers  published  on  work  
  • 3. Our  Overarching  Research  Interest   •  Goal:  AKempt  to  mine  social  media  text  for   clues  and  cues  toward  building  an   understanding  human  interac5ons   •  How:  Iden5fy  interes5ng  sociolinguis5c   behaviors  and  correlate  them  with  linguis5c   usage  that  is  quan%fiable  and  explicitly   characterizable  as  a  diagnos%c  device   •  Compare  these  devices  cross  linguis5cally  
  • 4.    Text  and  Social  Rela5ons   We   can   use   linguis5c   analysis   techniques   to   understand   the   implicit   rela5ons   that   develop   in   on-­‐line  communi5es   Image  source:  clair.si.umich.edu  
  • 5.    Many  Different  Forms  of  Social  Media   •  Communica5on     •  Collabora5on     •  Mul5media     •  Reviews  &  opinions      
  • 6.  Social  Media  Explosion   source:  www.internetworldstats.com   1.73  billion  Internet   users  worldwide.       75%  of  them  used   “Social  Media”  
  • 7.    Text  in  Social  Media   Some  social  media  applica5ons  are  all  about  text  
  • 8.    Text  in  Social  Media   Even  the  ones  based  on  photos,  videos,  etc.  have  a  lot  of   discussions  
  • 9.    Text  in  Social  Media   Huge  amount  of  text  exchanged  in  discussions   A  significant  treasure  trove    
  • 10. Interes5ng  Sociolinguis5c  Phenomena:   Social  Constructs   Mul5ple  Viewpoints  (Subgroups)   Influencers   Pursuit  of  Power   Disputed  Topics  
  • 11. Approach  to  processing  social  construct   phenomena   •  Like  any  good  scien5st  (or  imperialist):   divide  and  conquer   – Iden5fy  language  uses  (LU)  per5nent  to  the   different  social  constructs  (SC)     – Correlate  and  map  these  LUs  with  Linguis5c   Construc5ons/Cons5tuents  (LC)      
  • 12. Granularity  Level  Thread  
  • 13. Discover  relevant  LUs   •  AKempt  to  persuade   •  Agreement/disagreement   •  Nega5ve/posi5ve  aetude     •  Who  is  talking  about  whom     •  Dialog  paKerns     •  Signed  network     Do  not  depend  on  linguis%c  analysis   Rely  on  linguis%c  analysis        
  • 14. Discover  relevant  LUs   •  AKempt  to  persuade   •  Agreement/disagreement   •  Nega5ve/posi5ve  aetude     •  Who  is  talking  about  whom     •  Dialog  paKerns     •  Signed  network     Do  not  depend  on  linguis%c  modeling   Rely  on  linguis%c  modeling      
  • 15. LU:  AKempt  to  Persuade   •  An  expression  of  opinion  (a  claim)  followed  by   explicit  jus5fica5on  of  the  claim  (an  argumenta5on)   –  Persuade  to  believe,  not  persuade  to  act     – Claim:  grounding  in  experience,  commonly   respected  sources     – Argumenta5on:  evidence  and  support  from  other   discussants       CLAIM:  There  seems  to  be  a  much  beKer  list  at  the  Na5onal  Cancer   Ins5tute  than  the  one  we’ve  got.   ARGUMENTATION:  It  5es  much  beKer  to  the  actual  publica5on  (the   same  11  sec5ons,  in  the  same  order).    
  • 16. LU:  Agreement  and  Disagreement   •  Examine  pairs  of  phrases  to  model  others’   acceptance  of  the  par5cipant’s  ideas     P1  by  Arcadian:  There  seems  to  be  a  much  beKer  list  at  the  Na5onal   Cancer  Ins5tute  than  the  one  we’ve  got.  It  5es  much  beKer  to  the  actual   publica5on  (the  same  11  sec5ons,  in  the  same  order).  I’d  like  to  replace   that  sec5on  in  this  ar5cle.  Any  objec5ons?    P2  by  JFW:  Not  a  problem.  Perhaps  we  can  also  insert  the  rela5ve   incidence  as    published  in  this  month’s  wiki  Blood  journal   Example  of  Agreement   •  Shared  opinion  (explicit  expression),  shared   perspec5ve  (implicit  aetude)   •  Using  word  similarity  and  overlap  
  • 17. LU: -ve/+ve Attitude •  The attitude of a discussant/participant in a conversation toward another participant or topic or entity mentioned in the thread •  Characterize –ve and +ve sentences •  Positive: praise, express liking, etc. •  You are great •  Simply elegant and beautiful •  Negative: insult, dislike, disagreement, sarcasm, etc. •  You're a liar. •  You know, you're a pretty absurd individual even by Usenet standards. •  You're just pathetic.
  • 18. LU:  Aetude  towards  another  person   (2)  PER2:  No  it  hasn't  that's  a  bold  faced  lie.  A  definate    majority  of  Americans  support  the  public  option.    The  only    people  who  are  against  it  are  the  insurance  companies  and    moron  social  conservatives  like  you  who  don't  even    understand  what    socialism  is.  
  • 19. LU:  Aetude  towards  another  person   (2)  PER2:  No  it  hasn't  that's  a  bold  faced  lie.  A  definate    majority  of  Americans  support  the  public  option.    The  only    people  who  are  against  it  are  the  insurance  companies  and    moron  social  conservatives  like  you  who  don't  even    understand  what    socialism  is.     Using  nega5ve  and  insul5ng  language.  Sen5ment   and  word  polarity  are  the  devices  used  
  • 20. LU: Who is talking about whom How often a person refers to, or is referred to by, other discourse participants Use of mentions and their frequencies IsMyNameUsedByOthers HaveIUsedOthersName %OfUsersReferencedByMe %OfUsersReferencedMe %OfReferencesByMe %OfReferencesToMe ReferencesByMeToWordsRatio users references made by me/total number of words I wrote. ReferencesToMeToWordsRatio no. of references / total number of words by others
  • 21.    LU:  Signed  Network   1   1000   2841   Par55on  the  social  medium  network  into  posi5ve  and  nega5ve  links  based   on  polarity  of  words  used       What  is  the  public  opinion  on  the  health  care  reform?   2841  posts   More  than  300K  words  
  • 22.    LU:  Signed  Network   Par5cipants   Interac5ons  
  • 23.    LU:  Signed  Network   Par5cipants  Nega5ve  Interac5on   Posi5ve  Interac5on   Very  Hot  Topic     (high   percentage   of   nega5ve  links)  
  • 24.    LU:  Signed  Network   Against  Reform   (55%)   Pro  Reform   (45%)  
  • 25. LU:  Dialog  PaKerns   •  Dialog  PaKerns  are  based  on  metadata  (e.g.,   the  thread  structure),  not  the  text   – Ini5a5ve      who  started  the  thread   – Investment    share  of  par5cipa5on   – Irrelevance    how  omen  ignored  by  others   – Interjec5on    at  what  point  joined  conversa5on   – Incita5on      how  long  are  branches  started   – Inquisi5veness      the  number  of  ques5on  marks  
  • 26. Interes5ng  Sociolinguis5c  Phenomena:   Social  Constructs   Mul5ple  Viewpoints  (Subgroups)   Influencers   Pursuit  of  Power   Disputed  Topics  
  • 27. Who  is  an  Influencer?   •  Someone  whose  opinions/ideas  profoundly  affect  the  conversa5on   •  An  influencer  may  have  the  following  characteris5cs  (Katz  and  Lazarsfeld,   1955)   –  alter  the  opinions  of  their  audience   –  resolve  disagreements  where  no  one  else  can   –  be  recognized  by  others  as  one  who  makes  important  contribu5ons   –  omen  con5nue  to  influence  a  group  even  when  not  present     –  have  other  conversa5onal  par5cipants  adopt  their  ideas  and  even  the   words  they  use  to  express  their  ideas   •  More  formally,  an  influencer:   –  Has  credibility  in  the  group   –  Persists  in  aKemp5ng  to  convince  others,  even  if  some  disagreement   occurs   –  Introduces  topics/ideas  that  others  pick  up  on  or  support  
  • 28. Social  Construct:  Influencer  (inf)   •  Language  Uses   – AKempt  to  Persuade   – Agreement/disagreement     Influencers  
  • 29. What is Pursuit of Power? •  Individual makes repeated efforts to gain power within the group. •  The individual attempts to control the actions or goals of the group. •  Individual’s behavior causes tension within the group
  • 30. Social  Construct:  Pursuit  of  Power     (PoP)   •  Language  Uses   –  AKempt  to  Persuade   –  Agreement/disagreement   –  Nega5ve/posi5ve  aetude     –  Who  is  talking  about  whom     –  Dialog  paKerns  (non  linguis5c)   Pursuit  of  Power  
  • 31. Social  Construct:  Subgroup  Detec5on   Discussion     Thread   Subgroups   Discussant  
  • 32. Social  Construct:  Subgroup    (Sub)   •  Language  Uses   – Agreement/disagreement   – Nega5ve/posi5ve  aetude     – Signed  Network  (non  linguis5c)   Mul5ple  Viewpoints  (Subgroups)  
  • 33. Cross  Linguis5c  Comparison   •  The  SC  in  both  languages  use  same  LUs   •  But  do  Arabic  and  English  social  media  use   different  linguis5c  cons5tuents  to  show   language  use?   •  A  qualita5ve  view:      
  • 34. AKempt  to  persuade   •  Claims   –  A  lot  more  grounding  using  religious  references   –  Religion  plays  a  significant  role  in  Arabic  discourse   structure  therefore  used  to  establish  credibility  and   accordingly  influence  and  power  differen5als   •  Easily  detected  using  simple  devices  such  as  explicit   diacri5za5on     –  Less  subjec5ve  language  (less  usage  of  “I”  more  of   “we”,  or  exple5ves  such  as  “there,  it”)   ‫ﺗﺘﻔﻬﻢ‬ ‫أن‬ ‫ ﺣﺎول‬ –‫ﻧﺤﺎول‬ ‫أﻧﻨﺎ‬‫إﺷﻜﺎﻟﻴﺔ‬ ‫ﺛﺎﻧﻴﺎ‬ .. ‫ﻣﻌﺎﺻﺮة‬ ‫ﺑﻠﻐﺔ‬ ‫ﻣﻮﺳﻮﻋﺔ‬ ‫ﺑﻨﺎء‬ ‫ﻫﻨﺎ‬  ‫ص‬‫وﻗﺎ‬ ‫ﺑﻦ‬ ‫ﻋﻠﻘﻤﺔ‬ ‫ﺣﻴﺎة‬ ‫ﻓﻲ‬ ‫ﳑﻴﺰ‬ ‫ﺣﺪث‬ ‫ﻋﻦ‬ ‫ﺗﺨﺒﺮﻧﻲ‬ ‫أن‬ ‫ﳝﻜﻨﻚ‬ ‫ﻫﻞ‬ ... ‫اﳌﻠﺤﻮﻇﻴﺔ‬
  • 35. Agreement/Disagreement   •  Sharing  the  same  opinion  regarding  a  topic   –  Explicit  agreement   •  “I  agree  with  you  about  …”   ‫ﻫﺬا‬ ‫ﲟﺜﻞ‬ ‫ﺻﻴﺎﻏﺘﻬﺎ‬ ‫ﻋﻠﻰ‬ ‫أﺷﻜﺮك‬ ،ً‫ﺎ‬‫ﲤﺎﻣ‬ ‫ﻓﻴﻬﺎ‬ ‫أواﻓﻘﻚ‬ ‫ﻟﻠﻐﺎﻳﺔ‬ ‫ﻫﺎﻣﺔ‬ ‫ﻧﻘﻄﺔ‬ ‫ﻫﺬه‬  ‫ح‬‫اﻟﻮﺿﻮ‬ ‫أﻧﺎ‬‫أواﻓﻘﻚ‬  ‫ء‬‫اﻟﺒﻨﺎ‬ ‫ﻃﻮر‬ ‫ﻓﻲ‬ ‫ﻣﻮﺳﻮﻋﺔ‬ ‫أﻧﻨﺎ‬ –  Implicit  similar  aetude  toward  a  topic   •  Challenge     •  Pervasive  sarcasm   •  Pervasive  use  of  MWE  and  references  to  cultural  knowledge  
  • 36. Detec5ng  (dis)agreements/aetudes?   •  The  role  of  idiom/metaphor/sarcasm  in  Arabic  seems  to   be  more  pervasive   –  Tongue  twisters,  WiKy  language,  Puns     ‫ﺲ‬‫ﻠ‬h‫ا‬ ‫ﻓﻲ‬ ‫واﻻدﻗﻦ‬ ‫ﺷﻌﺮ‬ ‫ ﺣﻤﺰاوي‬ • •  MP  Hamzawy  being  liberal  has  long  hair  compared  to  the  MB   candidates  who  have  beards,  so  the  bet  on  whether  he  will  grow  his   hair  longer  or  grow  a  beard      ‫ﺔ‬‫ﺑﻄﻴﺨ‬ ‫اﷲ‬ ‫ﺷﺎء‬ ‫ﻣﺎ‬ ‫اﻟﺮاﺟﻞ‬ ‫ﻗﻠﺐ‬ ‫وﻟﻜﻦ‬ ‫واﺣﺪة‬ ‫ﺑﺬرة‬ ‫ﻳﺴﺎع‬ ‫ﻣﺎﳒﻪ‬ ‫اﳌﺮأة‬ ‫ ﻗﻠﺐ‬ • •  Heart  of  a  woman  is  like  a  mango  can  only  hold  one  seed,  but  a  man’s   heart  is  “God  Bless”  a  melon   –  Sarcasm   !‫ﺑﺴﻴﻄﻪ‬ ‫ ﻳﺎﻻ‬ • •  no  problem,  it  is  easy!  (We  are  screwed  regardless!)  
  • 37. Nega5ve/posi5ve  Aetude   •  Very  flowery  language  compared  to  English     •  Strong  condescending  language  to  show  nega5ve  aetude   •  Code  switching  into  dialectal  Arabic  expressions  to  show   support   –  Manipulate  different  registers  for  code  switching  depending  on   context:  CA  with  MSA/DA  code  switching  to  reflect  influence   •  Ben  Ali,  Tunisian  President  vs.  Mubarak,  Egyp5an  president  in  ouster   speech   •  Mubarak  –  Ex-­‐Egyp5an  President  on  visit  to  factories/ouster  from   posi5on  in  last  revolu5on   •  Mubarak  vs.  Nasser  vs.  Sadat   –  Balance  between  familiarity  and  distance  
  • 38. Nega5ve/posi5ve  Aetude   •  Plural  first  person  pronouns  allow  the  speaker   to  reduce  his/her  power  to  establish  rapport   and  show  posi5ve  aetude,     – e.g.,   ‫إﺣﻨﺎ‬  ‫ﺟﺎﻟﻨﺎ‬  ‫اﻟﺸﺮف‬  vs.   ‫أﻧﺎ‬  ‫ﺟﺎﻟﻲ‬  ‫اﻟﺸﺮف‬     – We  are  honored  vs.  I  am  the  honored  one   •  English  plural  pronouns  in  such  contexts   sound  patronizing  (the  textbook  “we”),   whereas  the  “royal  we”  is  disused.  
  • 39. Nega5ve/Posi5ve  aetude   •  Humor  is  commonly  used  in  Arabic  as  a  strategy   that  levels  power  rela5ons,  but  that  would  be   inappropriate  in  English.   •  Slightly  offensive  expressions  are  used  in  Arabic   to  maintain  power  balance  and  solidarity,  e.g.,      •‫اﺳﻜﺖ‬  ،‫ﻣﺶ‬  ‫ﻣﺤﻤﺪ‬  ‫ﳒﺢ‬     •‫واﻟﻨﺒﻲ‬  ‫ﻧﻘﻄﻨﺎ‬  ‫ﺑﺴﻜﺎﺗﻚ‬    . •  Only  very  few  such  expressions  are  acceptable  in   English  and  in  very  close  contexts,  e.g.,  shut  up   and  get  out  of  here.  
  • 40. Talking  about  whom  and  to  whom   •  More  manipula5on  of  power  differen5al   –  MSA  terms  of  address  add  formality,  and  therefore   power  to  the  speaker,  whereas  colloquial  terms  of   address  establish  informal/equal  levels  of  power.     •  Compare   ‫ﻳﺎ‬  ‫ﺳﻴﺪي‬  ‫اﻟﻌﺰﻳﺰ‬  to   ‫ﻳﺎ‬  ‫ﺧﻮﻳﺎ‬ .     •  English  does  not  have  such  as  a  rich  con5nuum  of   formality/informality  expressions.   •  Usage  of  expressions  such  as     –  Mona:  Mona  could  not  dare  refuse  a  request  from  Ali   –  Considered  strange  self  reference  in  English  but  it  is   used  as  means  of  showing  modesty  and  familiarity  
  • 41. Focus  of  this  talk   Influencers   Pursuit  of  Power   Disputed  Topics   Mul5ple  Viewpoints  (Subgroups)  
  • 42. Focus  of  this  talk   The  new  immigra5on  law  is  good.  Illegal   immigra5on  is  bad.   Peter   I  totally  disagree  with  you.  This  law  is  blatant   racism.   Mary   Have  you  read  all  what  Peter  wrote?  He  is  correct.   Illegal  immigra5on  is  bad  and  must  be  stopped.   John   You  are  clueless,  Peter.    Stop  suppor5ng  racism.   Alexander   Peter   John   Support  the  new  law   Against  the  new  law   Mary   Alexander  
  • 43. Sample  thread  
  • 44. Subgroup  Detec5on  System  Overview   Discussion     Thread   Subgroups   Discussant   Opinion  Expressions     Iden5fica5on   Thread     Parsing   … disagree…… …....... ………… like…………… ………………… bad…….   Candidate       Target   Iden5fica5on   ..........you……...   ......................... ......conservaEves   ideologues……….   ……………………… ....…..ImmigraEon   law…………………   Opinion-­‐Target   Pairing   disagree   You   like   Conserva5ve     Ideologues   bad   Immigra5on   law   Reply  Structure   Candidate       Target   Iden5fica5on   Clustering   Discussant  AJtude   Profiles  (DAPs)            
  • 45. Subgroup  Detec5on  System  Overview   Discussion     Thread   Subgroups   Discussant   Opinion  Expressions     Iden5fica5on   Thread     Parsing   … disagree…… …....... ………… like…………… ………………… bad…….   Candidate       Target   Iden5fica5on   ..........you……...   ......................... ......conservaEves   ideologues……….   ……………………… ....…..ImmigraEon   law…………………   Opinion-­‐Target   Pairing   disagree   You   like   Conserva5ve     Ideologues   bad   Immigra5on   law   Reply  Structure   Candidate       Target   Iden5fica5on   Clustering   Discussant  AJtude   Profiles  (DAPs)            
  • 46. 1  -­‐  Thread  Parsing   The  new  immigra5on  law  is  good.  Illegal   immigra5on  is  bad.   Peter   I  totally  disagree  with  you.  This  law  is  blatant   racism.   Mary   Have  you  read  all  what  Peter  wrote?  He  is  correct.   Illegal  immigra5on  is  bad  and  must  be  stopped.   John   You  are  clueless,  Peter.    Stop  suppor5ng  racism.   Alexander   P1   P2   P3   P4   D1   D2   D3   D4   Iden5fy  Posts,  Discussants,  and  the  reply  structure  of  the  discussion  thread  
  • 47. Subgroup  Detec5on  System  Overview   Discussion     Thread   Subgroups   Discussant   Opinion  Expressions     Iden5fica5on   Thread     Parsing   … disagree…… …....... ………… like…………… ………………… bad…….   Candidate       Target   Iden5fica5on   ..........you……...   ......................... ......conservaEves   ideologues……….   ……………………… ....…..ImmigraEon   law…………………   Opinion-­‐Target   Pairing   disagree   You   like   Conserva5ve     Ideologues   bad   Immigra5on   law   Reply  Structure   Candidate       Target   Iden5fica5on   Clustering   Discussant  AJtude   Profiles  (DAPs)            
  • 48. 2  -­‐  Iden5fy  Opinion  Words*   The  new  immigra5on  law  is  good+.  Illegal   immigra5on  is  bad-­‐.   Peter   I  totally  disagree-­‐  with  you.  This  law  is  blatant-­‐   racism-­‐.   Mary   Have  you  read  all  what  Peter  wrote?  He  is  correct+.   Illegal  immigra5on  is  bad-­‐  and  must  be  stopped.   John   You  are  clueless-­‐,  Peter.    Stop  suppor5ng  racism.   Alexander   P1   P2   P3   P4   D1   D2   D3   D4   *Iden5fying  opinion  words  using  Opinion  Finder  with  an  extended  lexicon   (implemented  using  random  walks  –  Hassan  &  Radev,  2011)  
  • 49. Subgroup  Detec5on  System  Overview   Discussion     Thread   Subgroups   Discussant   Opinion  Expressions     Iden5fica5on   Thread     Parsing   … disagree…… …....... ………… like…………… ………………… bad…….   Candidate       Target   Iden5fica5on   ..........you……...   ......................... ......conservaEves   ideologues……….   ……………………… ....…..ImmigraEon   law…………………   Opinion-­‐Target   Pairing   disagree   You   like   Conserva5ve     Ideologues   bad   Immigra5on   law   Reply  Structure   Candidate       Target   Iden5fica5on   Clustering   Discussant  AJtude   Profiles  (DAPs)            
  • 50. 3-­‐  Iden5fy  Candidate  Targets  of  Opinion   Target   Discussant  (  e.g.  you,    Peter)`   Topic/EnEty  (e.g.  The  new  immigra5on  Law,                                  Illegal  Immigra5on)    
  • 51. Candidate   Targets   3-­‐  Iden5fy  Candidate  Targets  of  Opinion   The  new  immigra5on  law  is  good+.  Illegal   immigra5on  is  bad-­‐.   Peter   I  totally  disagree-­‐  with  you.  This  law  is  blatant-­‐   racism-­‐.   Mary   Have  you  read  all  what  Peter  wrote?  He  is  correct+.   Illegal  immigra5on  is  bad-­‐  and  must  be  stopped.   John   You  are  clueless-­‐,  Peter.    Stop  suppor5ng  racism.   Alexander   P1   P2   P3   P4   D1   D2   D3   D4   All  discussants  are  candidate  Targets  
  • 52. Candidate   Targets   3-­‐  Iden5fy  Candidate  Targets  of  Opinion   The  new  immigra5on  law  is  good+.  Illegal   immigra5on  is  bad-­‐.   Peter   I  totally  disagree-­‐  with  you.  This  law  is  blatant-­‐   racism-­‐.   Mary   Have  you  read  all  what  Peter  wrote?  He  is  correct +.  Illegal  immigra5on  is  bad-­‐  and  must  be  stopped.   John   You  are  clueless-­‐,  Peter.    Stop  suppor5ng  racism.   Alexander   P1   P2   P3   P4   D1   D2   D3   D4   D1   D1   D1   Iden5fy  discussant  men5ons  (2pp  or  name)     in  the  discussion   D2  
  • 53. Candidate   Targets   3-­‐  Iden5fy  Candidate  Targets  of  Opinion   The  new  immigra5on  law  is  good+.  Illegal   immigra5on  is  bad-­‐.   Peter   I  totally  disagree-­‐  with  you.  This  law  is  blatant-­‐   racism-­‐.   Mary   Have  you  read  all  what  Peter  wrote?  He  is  correct +.  Illegal  immigra5on  is  bad-­‐  and  must  be  stopped.   John   You  are  clueless-­‐,  Peter.    Stop  suppor5ng  racism.   Alexander   P1   P2   P3   P4   D1   D2   D3   D4   D1   D1   D1   D1   Peter   Iden5fy  anaphoric  men5ons  of  discussants   D2  
  • 54. Candidate   Targets   3-­‐  Iden5fy  Candidate  Targets  of  Opinion   The  new  immigraEon  law  is  good+.  Illegal   immigraEon  is  bad-­‐.   Peter   I  totally  disagree-­‐  with  you.  This  law  is  blatant-­‐   racism-­‐.   Mary   Have  you  read  all  what  Peter  wrote?  He  is  correct +.  Illegal  immigraEon  is  bad-­‐  and  must  be  stopped.   John   You  are  clueless-­‐,  Peter.    Stop  suppor5ng  racism.   Alexander   P1   P2   P3   P4   D1   D2   D3   D4   D1   D1   D1   D1   Peter   Topic1   Topic1   Topic2   Topic2   D2   Topic  1   Topic  2  
  • 55. 3-­‐  Iden5fy  Candidate  Targets  of  Opinion   •  Techniques  used  to  iden5fy  topical  targets  :   – Named  En5ty  Recogni5on   – Noun  phrase  chunking    
  • 56. Subgroup  Detec5on  System  Overview   Discussion     Thread   Subgroups   Discussant   Opinion  Expressions     Iden5fica5on   Thread     Parsing   … disagree…… …....... ………… like…………… ………………… bad…….   Candidate       Target   Iden5fica5on   ..........you……...   ......................... ......conservaEves   ideologues……….   ……………………… ....…..ImmigraEon   law…………………   Opinion-­‐Target   Pairing   disagree   You   like   Conserva5ve     Ideologues   bad   Immigra5on   law   Reply  Structure   Candidate       Target   Iden5fica5on   Clustering   Discussant  AJtude   Profiles  (DAPs)            
  • 57. 4-­‐  Opinion-­‐Target  Pairing   I  totally  disagree-­‐  with  you.  The  new  immigraEon   law  is  blatant-­‐  racism-­‐.   Mary   P2   D1   Topic1   nsubj(disagree-3, I-1) advmod(disagree-3, totally-2) root(ROOT-0, disagree-3) prep_with (disagree-3, you-5)Rule     nsubj(racism-­‐-4, Topic1-1) cop(racist-4, is-2) amod(racism-4, blatant-3) root(ROOT-0, racist-4) Rule    
  • 58. Named  en5ty  rules  
  • 59. Candidate   Targets   4-­‐  Opinion-­‐Target  Pairing   The  new  immigraEon  law  is  good+.  Illegal   immigraEon  is  bad-­‐.   Peter   I  totally  disagree-­‐  with  you.  This    law  is  blatant-­‐   racism-­‐.   Mary   Read  all  what  Peter  wrote.  He  is  correct+.  Illegal   immigraEon  is  bad-­‐  and  must  be  stopped.   John   You  are  clueless-­‐,  Peter.    Stop  suppor5ng  racism.   Alexander   P1   P2   P3   P4   D1   D2   D3   D4   D1   D1   D1   D1   Peter   Topic1   Topic1   Topic2   Topic2   Topic  1   Topic  2  
  • 60. 4-­‐  Opinion-­‐Target  Pairing   •  Language  Uses  (LUs)  present  in  this  step:   – Targeted  sen5ment  toward  other  discussants  (2nd   person)   – Targeted  Sen5ment  toward  topic  men5ons  (3rd   person)   I  totally  disagree -­‐  with  you.   This  law  is  blatant -­‐  racism -­‐.  
  • 61. 4-­‐  Opinion-­‐Target  Pairing   •  LU  details   – Rule-­‐based  detec5on  of  sen5ment  targets   (we’ve  also  been  experimen5ng  with  supervised  target   detec5on  methods)   – Discussant  targets  are  iden5fied  by  2nd  person   pronouns  (you,  your,  yourself,  etc.)  and  by   username  men5ons  (casper3912,  etc.)  
  • 62. Subgroup  Detec5on  System  Overview   Discussion     Thread   Subgroups   Discussant   Opinion  Expressions     Iden5fica5on   Thread     Parsing   … disagree…… …....... ………… like…………… ………………… bad…….   Candidate       Target   Iden5fica5on   ..........you……...   ......................... ......conservaEves   ideologues……….   ……………………… ....…..ImmigraEon   law…………………   Opinion-­‐Target   Pairing   disagree   You   like   Conserva5ve     Ideologues   bad   Immigra5on   law   Reply  Structure   Candidate       Target   Iden5fica5on   Clustering   Discussant  AJtude   Profiles  (DAPs)            
  • 63. 5-­‐  Discussant  Aetude  Profile   Target1   ………   Targetn   +   -­‐   int   +   -­‐   int   +   -­‐   int   DAP1   DAP2  
  • 64. 5-­‐  Discussant  Aetude  Profile   Peter   Mary   John   Alexander   Topic  1   Topic  2   Targets   Discussants   0   0   0   0   0   0   1   0   1   0   0   0   1   0   1   0   1   1   0   0   0   0   0   0   0   1   1   1   0   1   0   2   2   0   0   0   0   0   0   1   0   1   1   0   2   0   0   0   0   0   0   0   1   1   1   0   1   0   0   0   0   1   1   0   0   0   0   0   0   0   0   0  
  • 65. Subgroup  Detec5on  System  Overview   Discussion     Thread   Subgroups   Discussant   Opinion  Expressions     Iden5fica5on   Thread     Parsing   … disagree…… …....... ………… like…………… ………………… bad…….   Candidate       Target   Iden5fica5on   ..........you……...   ......................... ......conservaEves   ideologues……….   ……………………… ....…..ImmigraEon   law…………………   Opinion-­‐Target   Pairing   disagree   You   like   Conserva5ve     Ideologues   bad   Immigra5on   law   Reply  Structure   Candidate       Target   Iden5fica5on   Clustering   Discussant  AJtude   Profiles  (DAPs)            
  • 66. Clustering   Peter  Mary   John  Alexander   Subgroup  2  Subgroup  1   (Peter -­‐,  Topic1 -­‐)   (Peter -­‐)   (Topic1 +,  Topic  2 -­‐)   (Peter +,  Topic  2 -­‐)  
  • 67. Evalua5on  
  • 68. Data   •  117    Discussions     •  Short  threads       •  short  posts   •  Human  annota5on   •  More  formal   •  12    Polls  +  Discussions     •  Long  threads   •  Long  and  short  posts   •  Data  self-­‐labeled   •  Less  formal   •  30    debates   •  Long  threads   •  Long  and  short  posts   •  Data  self-­‐labeled   •  Less  formal  
  • 69. Evalua5on  dataset  
  • 70. Evalua5on  Metrics     1.  Purity   Source:  hKp://nlp.stanford.edu/IR-­‐book/html/htmledi5on/evalua5on-­‐of-­‐clustering-­‐1.html  
  • 71. Evalua5on  Metrics     2.  Entropy   3.  F-­‐Measure   where  P(I,  j)  is  the  probability  of  finding  an  element   from  the  category  i  in  the  cluster  j,  nj  is  the  number  of   items  in  cluster  j,  and  n  the  total  number  of  items  in   the  distribu5on.  
  • 72. Wikipedia   PoliEcal  Forum   Create  debate   Purity   0.66   0.61   0.64   Entropy   0.55   0.80   0.68   F-­‐measure   0.61   0.56   0.60   English  Results  
  • 73. Baselines   •  Interac5on  Graph  Clustering  (GC)   –  Nodes:  Par5cipants   –  Edges:  interac5ons  (connect  two  par5cipants  if  they   exchange  posts)   •  Text  Classifica5on  (TC)   –  Build  TF-­‐IDF  vectors  for  each  par5cipant  (using  all  his/ her  posts)   –  Cluster  the  vector  space  
  • 74. Comparison  to  baselines   Our System
  • 75. Choice  of  Clustering  Algorithm   •  K-­‐means   •  Expecta5on  Maximiza5on  (EM)   •  Farthest  First  (FF)  
  • 76. Choice  of  Clustering  Algorithm   •  K-­‐means   •  Expecta5on  Maximiza5on  (EM)   •  Farthest  First  (FF)  
  • 77. Component  Evalua5on   Our  System   No  Topical  Targets   No  Discussant  Targets   No  SenEment   No  InteracEon   No  Anaphora  ResoluEon   No  Named  EnEty  Recog.   No  NP  Chunking  
  • 78. Component  Evalua5on   Our  System   No  Topical  Targets   No  Discussant  Targets   No  SenEment   No  InteracEon   No  Anaphora  ResoluEon   No  Named  EnEty  Recog.   No  NP  Chunking   Not really a linguistic feature
  • 79. Component  Evalua5on   Our  System   No  Topical  Targets   No  Discussant  Targets   No  SenEment   No  InteracEon   No  Anaphora  ResoluEon   No  Named  EnEty  Recog.   No  NP  Chunking   More of a linguistic feature!
  • 80. Deeper  look  at  Agreement/ Disagreement  and  Aetude   •  So  far  we  employed  shared/divergent  opinion   in  the  form  of  explicit  polarity  indicators   – Sen5ment  polarity  towards  other  discussants   •  A:  So,  no  maBer  how  much  faith  you  have,  one  of  you   MUST  be  wrong!  (negaHve)   •  B:  You  are  a  scienHst?!  May  I  ask  in  which  field?   (negaHve)   – Sen5ment  polarity  towards  an  enHty     •  A:  Here  is  an  excellent  verse  from  the  Bible..  (posiHve)   •  B:  The  Bible  rightly  says  that...  (posiHve)  
  • 81. Implicit  Opinion/Perspec5ve   •  Observa5on:  People  sharing  similar  beliefs/perspec5ve   tend  to  use  the  same  evidence  to  support  their  point     –  Believers:  faith,  peace,  love,  ci5ng  verses  from  the  Bible...     –  Atheists:  reason,  science,  aKack  on  perceived  logical  flaws  in   Bible...     •  However  it  is  not  always  explicit  (using  similar  words  and   similar  aetudes)   •  Peter:  God  is  the  creator  of  mankind   •  Mary:  The  belief  in  an  ul5mate  divine  being  has  sustained  me  over  the   years     –  Not  necessarily  posi5ve/nega5ve   –  High  dimensional  similarity  (looking  at  the  surface  words)   between  both  sentences  is  low!     –  BUT  we  know  Mary  and  Peter  share  the  same  perspecEve   and  will  tend  to  be  in  agreement  with  each  other  
  • 82. Modeling  of  implicit  agreement/ disagreement     •  Implicit  agreement  or  disagreement   (perspec5ve)  –  using  text  similarity  to  help   iden5fy  subgroups     •  Perspec5ve  modeling  is  used  to  complement   explicit  aetude     •  Perspec5ve  granularity  has  to  be  collected  on   the  level  of  a  thread  rather  than  a  single  post   – Hence  we  summarize  all  the  posts  in  the  thread    
  • 83. Our  Model   •  Explicit  high  dimensional  aetude  toward   other  discussants   •  Explicit  high  dimensional  aetude  toward   named  en55es   •  Model  shared  perspec5ve  among  discussants   over  threads  using  textual  similarity  on  the   post  level  in  the  latent  space    
  • 84. Extrac5ng  explicit  aetude  toward   other  discussants     •  Iden5fy  polarity  of  each  sentence     •  Use  the  thread  structure  of  the  discussion  to   iden5fy  the  target  discussant     •  If  the  sentence  has  second  person  pronouns   (Hassan  et  al.,  2010),  then  the  polarity  is   assumed  to  be  towards  the  target  of  the   sentence  
  • 85. Extrac5ng  explicit  aetude  toward   named  en55es   •  Iden5fy  polarity  of  each  sentence   •  Run  Stanford  Named  En5ty  Tagger  on   sentences   •  If  the  sentence  has  Named  En55es,  then  the   polarity  is  assumed  to  be  towards  those   en55es  
  • 86. Extrac5ng  implicit  perspec5ve   •  Run  Latent  Dirichlet  Alloca5on  (LDA)  on  the   thread   •  Extract  the  topic  distribu5on  of  each  post   •  Aggregate  the  distribu5ons  of  all  posts   between  each  pair  of  discussants  
  • 87. Feature  Representa5on:  Aetude  Profiles       •  Vector  Representa5on     •  Explicit  aetude  toward  other  discussants     A   B   C   A   0      1      1   1  1        2   0      0      0   B   …   C   -­‐-­‐  
  • 88. Feature  Representa5on:  Aetude  Profiles       •  Vector  Representa5on     •  Explicit  aetude  toward  En55es   A   B   C   E1   E2   A   0      1      1   1  1        2   0      0      0   1      1      2   1      0      1   B   …   C   -­‐-­‐  
  • 89. Feature  Representa5on:  Aetude  Profiles       •  Vector  Representa5on     •  Implicit  aetude  toward  other  discussants       A   B   C   E1   E2   A   B   C   A   0      1      1   1  1        2   0      0      0   1      1      2   1      0      1   1    1    1   1    0    0.5   0.5  0    0   B   …   C   -­‐-­‐   1    1    1    
  • 90. Data   •  Create  Debate  (CD)     –  www.createdebate.com     –  Deba5ng  on  a  certain  topic     –  Sides  are  explicitly  indicated  by  discussants  in  a  poll     –  Informal  language     •  Wikipedia  Discussion  Forum  (WIKI)   –  en.wikipedia.org     –  Groups  labels  are  manually  annotated     –  Formal  language,  not  much  nega5ve  polarity    
  • 91. Experimental  Condi5ons   •  Clustering  algorithm   –  S-­‐Link   #  of  clusters  by  rule  of  thumb  =  √n/2   •  Evalua5on  Metrics   –  Purity,  Entropy,  F-­‐measure     •  Baseline   –  RAND-­‐BASE:  Assign  discussants  to  clusters  randomly   –  SWD-­‐BASE:  Calculate  surface  word  distribu5on,  as  a   simpler  form  of  perspec5ve  
  • 92. Results   CondiEon   Wiki   CD   Purity   Entropy   Fmeasure   Purity   Entropy   Fmeasure   RAND-­‐BASE   0.675   0.563   0.652   0.399   0.966   0.41   SWD-­‐BASE   0.772   0.475   0.646   0.452   0.932   0.432   SD   0.834   0.360   0.667   0.824   0.394   0.596   SE   0.827   0.383   0.655   0.793   0.422   0.582   SD+SE   0.835   0.362   0.665   0.82   0.385   0.604   PERS   0.853   0.321   0.699   0.787   0.399   0.589   SD+PERS   0.853   0.320   0.698   0.849   0.333   0.615   SE+PERS   0.853   0.321   0.702   0.789   0.399   0.591   SD+SE+PERS   0.857   0.310   0.703   0.861   0.315   0.625  
  • 93. Observa5ons   CondiEon   Wiki   CD   Purity   Entropy   Fmeasure   Purity   Entropy   Fmeasure   RAND-­‐BASE   0.675   0.563   0.652   0.399   0.966   0.41   SWD-­‐BASE   0.772   0.475   0.646   0.452   0.932   0.432   SD   0.834   0.360   0.667   0.824   0.394   0.596   SE   0.827   0.383   0.655   0.793   0.422   0.582   SD+SE   0.835   0.362   0.665   0.82   0.385   0.604   PERS   0.853   0.321   0.699   0.787   0.399   0.589   SD+PERS   0.853   0.320   0.698   0.849   0.333   0.615   SE+PERS   0.853   0.321   0.702   0.789   0.399   0.591   SD+SE+PERS   0.857   0.310   0.703   0.861   0.315   0.625   Best  Performance  is  when  we  combine  explicit  aetude  (SD  Sen5ment   toward  other  discussants,  SE  Sen5ment  toward  En55es)  with  implicit   perspec5ve  (PERS),  regardless  of  genre  
  • 94. Observa5ons   CondiEon   Wiki   CD   Purity   Entropy   Fmeasure   Purity   Entropy   Fmeasure   RAND-­‐BASE   0.675   0.563   0.652   0.399   0.966   0.41   SWD-­‐BASE   0.772   0.475   0.646   0.452   0.932   0.432   SD   0.834   0.360   0.667   0.824   0.394   0.596   SE   0.827   0.383   0.655   0.793   0.422   0.582   SD+SE   0.835   0.362   0.665   0.82   0.385   0.604   PERS   0.853   0.321   0.699   0.787   0.399   0.589   SD+PERS   0.853   0.320   0.698   0.849   0.333   0.615   SE+PERS   0.853   0.321   0.702   0.789   0.399   0.591   SD+SE+PERS   0.857   0.310   0.703   0.861   0.315   0.625   WIKI  seems  to  gain  more  from  implicit  perspec5ve  compared  to  CD    Explicit  Aetude  is  a  beKer  feature  for  CD:  people  express  their    sen5ments  openly,  while  in  WIKI  people  are  more  constrained  and    subtle  in  their  expressions  
  • 95. Observa5ons   CondiEon   Wiki   CD   Purity   Entropy   Fmeasure   Purity   Entropy   Fmeasure   RAND-­‐BASE   0.675   0.563   0.652   0.399   0.966   0.41   SWD-­‐BASE   0.772   0.475   0.646   0.452   0.932   0.432   SD   0.834   0.360   0.667   0.824   0.394   0.596   SE   0.827   0.383   0.655   0.793   0.422   0.582   SD+SE   0.835   0.362   0.665   0.82   0.385   0.604   PERS   0.853   0.321   0.699   0.787   0.399   0.589   SD+PERS   0.853   0.320   0.698   0.849   0.333   0.615   SE+PERS   0.853   0.321   0.702   0.789   0.399   0.591   SD+SE+PERS   0.857   0.310   0.703   0.861   0.315   0.625   BeKer  results  obtained  on  the  same  data  set  from  the  previous  results  for   WIKI  (P  0.66,  E  0.55)  CD  (P  0.64,  E  0.68)  
  • 96. Our  Social  Constructs   Mul5ple  Viewpoints  (Subgroups)   Influencers   Pursuit  of  Power  
  • 97. The  LUs  used  in  Final  System   •  AKempt  to  persuade  (Inf)   •  Agreement/disagreement  (Inf,  Sub)   •  -­‐ve/+ve  aetude  without  perspec5ve  (sub)   •  Who  is  talking  about  whom  (PoP)   •  Dialog  paKerns  (PoP)   •  Signed  network  (Sub)   Do  not  depend  on  linguis%c  analysis   Rely  on  linguis%c  analysis        
  • 98. LUs  and  SCs   LU/SC   Influencer   Pursuit  of  Power   Subgroup   AKempt  to  Persuade   ✔   Agreement/Disagreement   ✔   ✔   -­‐ve/+ve  aetude   ✔   ✔   Who  is  talking  about  whom   ✔   Dialogue  PaKerns   ✔   Signed  Networks   ✔  
  • 99. Challenges  with  processing  Arabic   Social  media   •  Genre   – WikiPedia   •  MSA  with  dialectal  style  and  mul5word  expressions/ lexical  items   – Blogs  from  BOLT  mostly  dialectal  with  pervasive   code  switching  and  seman5c  faux  amis   •  Implica5ons  for  preprocessing     – Our  tools  are  trained  on  formal  MSA  genres     •  Hence  degrada5on  in  basic  NLP  processing,  for   example  POS  tagging  in  MSA  is  97%  accuracy,  in  Blog   data  we  are  at  94%  (on  a  good  day!)  
  • 100. Formal  Gov.  Evalua5on  (nDCG%)   09/2012     En-­‐WIKI   En-­‐Fora   Ar-­‐WIKI     Ar-­‐Fora   Subgroup  (without  perspec%ve)   48.2   50.6   57.4   37.5   Influencer   82.8   78.3   85.1   84.9   Pursuit  of  Power   87.8   77.7   91.6   74.6  
  • 101. Formal  Gov.  Evalua5on  (nDCG%)   09/2012     En-­‐WIKI   En-­‐Fora   Ar-­‐WIKI     Ar-­‐Fora   Subgroup  (without  perspec%ve)   48.2   50.6   57.4   37.5   Influencer   82.8   78.3   85.1   84.9   Pursuit  of  Power   87.8   77.7   91.6   74.6   In  general,  Subgroup  is  the  hardest      
  • 102. Formal  Gov.  Evalua5on  (nDCG%)   09/2012     En-­‐WIKI   En-­‐Fora   Ar-­‐WIKI     Ar-­‐Fora   Subgroup  (without  perspec%ve)   48.2   50.6   57.4   37.5   Influencer   82.8   78.3   85.1   84.9   Pursuit  of  Power   87.8   77.7   91.6   74.6   In  general,  Subgroup  is  the  hardest       Pursuit  of  power  relies  mostly  on   shallow  linguis5c  features  (men5ons)   and  dialog  structure    
  • 103. Formal  Gov.  Evalua5on  (nDCG%)   09/2012     En-­‐WIKI   En-­‐Fora   Ar-­‐WIKI     Ar-­‐Fora   Subgroup  (without  perspec%ve)   48.2   50.6   57.4   37.5   Influencer   82.8   78.3   85.1   84.9   Pursuit  of  Power   87.8   77.7   91.6   74.6   Fora  are  harder  to  deal  with  than  WIKI  genre  
  • 104. Formal  Gov.  Evalua5on  (nDCG%)   09/2012     En-­‐WIKI   En-­‐Fora   Ar-­‐WIKI     Ar-­‐Fora   Subgroup  (without  perspec%ve)   48.2   50.6   57.4   37.5   Influencer   82.8   78.3   85.1   84.9   Pursuit  of  Power   87.8   77.7   91.6   74.6   Arabic  WIKI  did  beBer  than  English  WIKI  
  • 105. Formal  Gov.  Evalua5on  (nDCG%)   09/2012     En-­‐WIKI   En-­‐Fora   Ar-­‐WIKI     Ar-­‐Fora   Subgroup  (without  perspec%ve)   48.2   50.6   57.4   37.5   Influencer   82.8   78.3   85.1   84.9   Pursuit  of  Power   87.8   77.7   91.6   74.6   Arabic  Influencer  significantly  impacted  by   simple  diacriHzaHon  detecHon  for  claims   (grounding)  
  • 106. Conclusions   •  We  can  successfully  computa5onally  model   sociopragma5c  phenomena   – There  is  significant  room  for  improvement   •  S5ll  discovering  how  to  model  the  phenomena   in  a  more  language  specific  manner   – We  are  just  scratching  the  surface  of  understanding   the  sociopragma5c  linguis5c  features   •  NOW  more  than  ever  collabora5ons  are   necessary  
  • 107. Any  takers!    ‫ﻢ‬‫ودﻣﺘ‬   Thank  you   Ques5ons?