Your SlideShare is downloading. ×
Регина Барзилай "Извлечение информации из социальных медиа"
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Регина Барзилай "Извлечение информации из социальных медиа"

380
views

Published on

31 января, семинар "День MIT в Яндексе" …

31 января, семинар "День MIT в Яндексе"
Регина Барзилай "Извлечение информации из социальных медиа"

- Методы машинного обучения в применении к извлечению информации из сетевого пользовательского контента.

- Рассмотрение набора задач, связанных с извлечением информации, таких как анализ рецензий по составляющим и создание базы событий по твитам.

- Автоматическое построение контентной структуры документа на основе большого потока пользовательского контента с сильным шумом.

- Автоматическая агрегация содержимого рецензий и извлечении событий из потока сообщений в твиттере.

Published in: Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
380
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Informa(on  Extrac(on  for   Social  Media   Regina  Barzilay  Chris(na  Sauper,    Aria  Haghighi,  and  Ted  Benson   1  
  • 2. Selec(ng  a  Hotel   2  
  • 3. Selec(ng  a  Hotel   3  
  • 4. Selec(ng  a  Hotel   4  
  • 5. User-­‐generated  Content  •  Large  amounts  of  user-­‐generated  content  •  Increasingly  important  in  decision  making  •  Time-­‐consuming  to  read  it  all   NLP can help! 5  
  • 6. The  Power  of  Word  Counts  Simple  sta(s(cal  models  are  effec(ve  for  many  informa(on  extrac(on  tasks  •  Bag-­‐of-­‐words  approaches  for  classifica(on   trading     financial   stocks   bank   cloudy   cold   storm   plants     6  
  • 7. The  Power  of  Word  Counts  Simple  sta(s(cal  models  are  effec(ve  for  many  Informa(on  Extrac(on  tasks    •  Sequence  labeling  for  seman(c  role  labeling   the   earthquake   injured   three   people   NONE   EVENT   NONE   CASUALTIES   CASUALTIES   7  
  • 8. The  Power  of  Word  Counts                                                                                                           8  
  • 9. The  Power  of  Word  Counts  Every  me  I  fire  a  linguist,  the  performance  of  the  speech  recognizer  goes  up    (F.  Jelinek)                                                                                                         9  
  • 10. Beyond  Wall  Street  Journal  Moving  from  formal  text…      …to  social  media   10  
  • 11. Our  Approach  •  Model  document  structure  as  part  of  the   extrac(on  process  •  Exploit  large  amount  of  raw  data  to   supplement    annota(ons   11  
  • 12. I  wandered  in  here  on  a  whim  with  a  friend  a  while   back,  completely  underdressed  but  on  the  lookout  for   This  is  a  fantas(c  restaurant  in  Cambridge.  The  decor,  So   so   so   good!   One   of   my   favorite   restaurants   in   a  good  meal.  When  we  went  around  the  side  to  the   music  and  smells  will  make  you  feel  like  you  are  in  Boston!  I  have  to  take  off  a  single  star,  because  there   The  suspense  factor  as  each  surprise  dish  was   entrance,  someone  called  to  us  from  the  roof-­‐-­‐"watch   another  world.  I  was  a  lille  skep(cal  when  I  heard  are   a   couple   dishes   I   didnt   enjoy,   but   if   you   go   and   delivered  added  to  the  whole  experience,  and  the   out,  theres  glass  on  the  floor,  I  dropped  a  light  bulb.   Turkish  Food,  but  those  feelings  were  quickly  order  well,  this  is  a  6  star  experience!   (ming  was  flawless-­‐-­‐wed  finish  a  dish,  have  ample   Can  you  go  inside  and  grab  someone  for  me  and  ask   squashed  when  I  had  the  food.     Had  dinner  here  on  Friday  night  and  it  was  superb!!   A  new  favorite!     them  to  bring  a  screwdriver?"  Sure,  no  problem-­‐-­‐so  Oleana  serves  inspired,  well  prepared  food  from  the   I         (me  to  enjoy  it,  a  few  minutes  for  some  sips  of  wine    You  start  with  bread  and  the  most  delicious  olive  oil.  It  The  hosts  seated  us  almost  immediately  (even   and  conversa(on,  and  just  as  there  was  a  breath,  the   went  inside  and  told  the  hostess,  "Hey  your   best  possible  ingredients.    The  menu  is  well  priced  for   started  with  the  Fried  Mussels  with  Hot  Peppers   We   Cute  ambiance...great  for  in(mate  dinner  or  date   #1.   place.  Dim  ligh(ng,  nice  decor,  not  too  loud.    has   a   strong   olive   taste,   I   had   to   forcefully   stop   myself   maintenance  guy  needs  a  screwdriver  up  on  the  roof."   quality.    The  wine  list  is  very  food  friendly,   the   and  Turkish  Tarator  Sauce.  The  Mussels  were  fried  to   though  we  didnt  have  a  reserva(on  on  a  Friday  night).   dish  would  arrive.  The  whole  meal  felt  like  a  well-­‐ next   Delicious.  Delighsul.  Worth  it.    from  finishing  our  basket.     orchestrated  performance  rather  than  just  a  meal.   She  laughed,  "Oh...  thats  not  a  maintenance  guy,   includes  many  organic  and  bio-­‐dynamic  wines,  and  iperfec(on  and  the  baler  was  very  light.  I  could  have   s     thats  the  owner."     Service  was  excellent  as  well  as  all  the       also  reasonably  priced.    Its  a  great  place  for   eaten  a  thousand  of  them,  but  I  love  fried  food.  The   #2.  The  food  was  amazing  (N.B.  the  bread  in  the     Vermont  Quail  was  very  tasty  as  well.  The  quail  was  Ive  eaten  at  Oleana  several  (mes  and  the  food  is   recommenda(ons.    My   sugges(on   is   to   order   family   style,   and   based  basket  goes  really  well  with  the  warmed,  spiced  olive   for  the  food-­‐-­‐theres  not  enough  to  be  said.  I  realize   on   As   vegetarians.    A  vegetarian  tas(ng  menu  was  available  the   number   of   people   you   have,   order   main   entrees   And  from  that  moment  I  knew  this  place  was  special.   night  I  went  -­‐  it  was  superb  and  plen(ful  (I  could   the   very  tender,  which  is  hard  to  do  because  those  lille   always  very  good.  The  service  is  typically  really  solid  -­‐     oil  from  one  of  our  meze  plates),  We  had  the  olives  this  is  a  lille  cliched,  but  honestly,  I  havent  tasted   w/  for   half   your   party   size   and   3-­‐4   small   plates   for   the   food  like  this  since  living  in  the  bay  area,  with  crea(ve   a  good  sign  when  the  owner  of  a  restaurant  is  up   finish  it).       Its   not   guys  are  so  small.  The  last  starter  was  the  Sultans   Ive  had  some  dinners  where  the  waitstaff  was  really   Started  with  the  Moroccan-­‐style  Octopus  and  Fatoush.   zaatar  (a  lot  for  two  people,  $5),  quail  kebabs   top-­‐notch  and  other  (mes  when  its  been  good  but   Tasty,  light,  unique  flavors,  great  presenta(on.  other  half.  (ex.  Party  of  4  =  2  entrees  and  6-­‐8  apps)   combina(ons  and  well-­‐designed  dishes  and  contrasts   the  roof  changing  lightbulbs-­‐-­‐its  clear  what  kind    of   on   Delight.  The  Tamarind  Glazed  Beef  was  so  tender.  The   (delicious,  tender,  spicy,  2  for  $13),  monkfish  curry   not  spectacular.     Everything  was  quickly  eaten  up  with  smiles.       (yummy,  $26)  and  a  tea-­‐ser  dessert  with  sour  cherries   surprise  the  palele  and  are  a  delight  to  eat;  not   and  alen(on  goes  into  every  detail.  If  only  I  had   service  is  friendly,  prompt,  and  helpful.    The  space   that   care   The   Smokey  Eggplant  Puree  went  well  with  the  dish  and      The   entrees   really   are   to   die   for,   every   one   Ive   had   tastes  a  lot  like  vermicelli/milk  dessert  (12).   simply  good,  but  joy-­‐inducing.  The  highlight  of  the   known  then  how  that  would  translate  to  the  en(re   is  relaxing  and  casually  elegant.    I  was  there  on  a   was  even  beler  slathered  on  the  bread.     that   Either  way  the  hummus  and  falafel  small  plates  (meze)   For  entrees,    had  the  Beef  Kabob  special  (which  was  has  been  delicious.  Such  a  great  combo  of  flavors  and     night  was  a  dish  of  crab  cakes  with  asparagus-­‐-­‐they  experience,  and  especially  the  food.   Tuesday  evening  when  two  men  were  quietly  playing    spices,   they   some   seriously   ar(s(c   crea(ons!   The   best     lovely  world  music.   We  shared  an  entree  for  the  evening.  I  highly  suggest   just  SO  tasty.  Definitely  a  great  place  to  celebrate   the  hit!,  beef  with  delicious  flavors  and  cooked   are   #3.  The  service  was  excellent-­‐-­‐perfectly  (med  plates,   a  small  poached  quail  egg  on  each  cake,  combined   had   a  special  event  or  just  when  you  need  a  par(cular   perfectly  med-­‐rare  tender),  Cod,  and  Lamb.  Everything  small   plates   are   the   Sultans   Delight   (fall   apart   lamb   with  a  lemon  flavor  from  a  lille  juice  and  zest  it   Once  out  on  the  pa(o,  we  waited  for  only  a  few     the  Azuluna  Pork,  Crispy  Pea  Paella,  Fried  Fiddleheads   etc.     pick-­‐me-­‐up.  Not  the  fussiest  or  the  fanciest  food  (this   was  tasty  yet  light,  with  delicate  complimen(ng  and   unreal   baba),   the   spiced   carrots   (seems   simple,   almost  made  a  meringue  that  was  incredible.  This  is   minutes  by  the  fountain  before  being  seated.  I  had  a   I  was  visi(ng  the  Boston  area  with  my  family,  we  and  Paprika  Sauce.  The  pork  was  just  as  tender  as  the   As     is  a  compliment  from  me!)  nor  the  most  elegant   flavors.    but   they   are   amazing!),   the   falafel....ok,   there   are   a  Ana  Sortun  was  there!  Shes  awesome  and  she  was   #4.   without  a  doubt  the  best  single  dish  I  have  had  at  any   of  wine  and  was  just  enjoying  the  pa(o,  the   brought  along  our  children  (8&10).    While  I  would  not   and  the  accompaniments  went  so  well  with  the   glass   beef   ambiance  (wish  it  was  a  lille  more  quiet),,  but  clearly    bunch  that  are  great,  but  those  are  some  favorites!  hands-­‐on-­‐-­‐adding  things  to  dishes,  etc.   restaurant  in  Boston,  and  Id  go  back  to  Oleana  just  bread,  and  the  otherworldly  feel  of  the  place.  You   call  Oleana  family-­‐friendly,  they  were  accommoda(ng   The  meal  was  very  flavorful  and  seasoned  to   for   meat.     a  very  special  place  for  a  great  meal!  Plus  you  NEED  to   Dessert  was  the  winner-­‐Passion  fruit     this.  Or  any  other  of  the  dishes  we  had  that  night,   cant  be  stressed  out  here;  its  designed  perfectly  to   our  children.    I  should  note  one  of  my  children  is  a   of   perfec(on.    I   would   skip   the   deviled   eggs,   mul(ple   people   talked   frankly.  Some  were  beler  than  others,  but  each  was   almost  a  Shangri-­‐La  of  spaces,  and  all  in  the  middle   selec(ve  eater,  but  they  are  both  used  to  ea(ng   be   highly     get  the  Baked  Alaska  (YUM).  It  should  be  a   Bisteeya....goodness  what  is  this??  IT  was  to  DIE  for   I  would  go  back-­‐-­‐I  s(ll  like  the  hummus  at  Sofra  and   requirement  for  going  to  the  place.   and  a  perfect  ending  to  the  meal.  I  think  we  literally  them  up  to  me  before  I  went,  really  not  that  exci(ng.   unique  and  in  some  way  surprising  and  fun.   of  Inman  square.  Unexpected.  Impeccable.   in  high-­‐end  restaurants  and  a  very  well  behaved  in   I  wish  I  could  have  golen  dessert,  but  I  was  so  full.  I   wasnt  sure  that  Oleana  would  be  more  impressive-­‐-­‐     ate  this  in  2  seconds  and  contemplated  ordering  a  The   chick   peas   in   vermicelli,   also   not   great,   in   fact   I   it  was  excellent.     but     nice  restaurants  (or  so  we  are  told).   did  see  them  bring  some  out  and  they  looked  really   did   not   like   it   with   the   fake   orange   taste.   It   Absolutely  fantas(c  experience  all  around,  and  so  far   this  point  we  were  so  confident  in  the  holis(c     At   wonderful  and  decadent.  I  had  the  Sangria  to  drink,  (I  would  give  the  food  5  stars;  the  overall  ambiance  -­‐   second.  Its  light,  tart,  fluffy,  creamy,  and  thirst   it    reminded   me   of   one   of   those   chocolate   oranges   you   quality  of  the  restaurant  that  we  decided  to  trust  the   Given  that  I  live  in  the  San  Francisco  Bay  Area,  Im   was  refreshing  because  it  was  humid  outside,  but  it  read:  it  can  be  prely  loud  -­‐  and  waitstaff  variability   quenching  all  at  the  same  (me.     hlp://condensr.com   Great  place  for  a  special  occasion  meal  or  a  night  out   best  overall  restaurant  experience  Ive  had  in  the   the   knocked  the  overall  experience  to  a  4).    can  break  apart.  The  black  eyed  pea  soup  was  nothing   Boston  area.  These  people  understand  that  a  meal  is   and  go  all  out.  We  ordered  a  tas(ng  menu,  and   chef   spoiled  by  excellent  vegetarian  friendly  restaurants.     not  the  best  in  the  world.  I  think  I  might  try   was   with  tourists.   Needless  to  say  Oleana  became  a  favorite  in  one  special.   more  than  just  about  the  food,  its  about  the  service,   other  mezos/appe(zers  that  sounded  good  from   I  have  a  spot  to  return  to  in  the  Boston  area  that   two   Now   something  new  if  I  ever  go  back  again.     12   evening...the  food  is  just  very  unique.  I  love  have  great     the  wine,  the  scenery,  and  on  top  of  all  that,  the   the  specials  menu.  If  youre  here,  I  *highly*   meets  expecta(ons.    For   dessert,   please   get   the   baked   Alaska,   it   was   recommend  this.  You  probably  wont  end  up  spending   The  servers  are  great,  so  nice  and  knowledgeable   flavors  without  feeling  like  I  gained  10  pounds  ea(ng  a   flavors  and  combina(ons  of  culinary  delights.  Oleana  unbelievable!   any  more  than  if  you  had  ordered  individually,  but   about  the  menu.  It  really  means  a  lot  to  me  to  see   wonderful  dinner.  Ill  definitely  be  back  soon!   turned  an  otherwise  ordinary  night  into  an  experience   I  wont  forget,  and  I  cant  wait  to  return.   youll  taste  some  incredible  things  you  might  not  have   someone  get  excited  about  a  menu.   thought  to  get.      
  • 13. Mo(va(ng  Example   Aspect Snippets stylish decor atmosphere awesome art loved it! food tasty calzones! fast and friendly service impatient waiters Importance  of  Context:   ...  by  local  ar(sts.   {   Ordered  chicken   food   parm  and  loved  it!   Friend  had  the  veal.   The  service  was  ...   13  
  • 14. Mul(-­‐Aspect  Summariza(on  Content  Topic  Model     I  ordered  lunch  from  them  the  other  day  and  I  was   pleasantly  surprised.    Our  waiter  dazzled  me  with  his  blue   eyes  and  genuine  smile,  and  all  the  waiters  were  extremely   professional  and  efficient.    Sequence  Labeling  Task   I  ordered  lunch  from  them  the  other  day  and  I  was     [FOOD  pleasantly  surprised].    Our  waiter  dazzled  me  with   his  blue  eyes  and  genuine  smile,  and  all  the  waiters  were   [SERVICE  extremely  professional  and  efficient].     14  
  • 15. The  Big  Disconnect  Discourse  Modeling   Analysis  Applica(ons   -­‐  Topic  Models   -­‐  Informa(on  Extrac(on   -­‐  Rhetorical  Structure  Analysis   -­‐  Sen(ment  Analysis   15  
  • 16. Approach  Overview   words labels Task  Labels:  Observed  I  had  the  shrimp  salad  and  was  [FOOD  pleasantly  surprised].    The  [ATMOSPHERE  decor  was    tasteful]  and  staff  was  [SERVICE  extremely  professional  and  efficient].     16  
  • 17. Approach  Overview   Content  Labels:  Latent   Task  Labels:  Observed  Goal:    Analysis  applica(ons  sensi(ve  to  document  structure   17  
  • 18. Approach  Overview  •  Jointly  learn  structure  and  task  parameters   –  Topics  are  latent  variables  shaped  by  task  •  Principled  way  to  incorporate  unlabeled  data   –  More  unlabeled  data,  beler  performance   18  
  • 19. Factoriza(on   {   {   {   {   Topic  Trans.   Bag-­‐of-­‐words   CRF  Product  over  sentences   19  
  • 20. Mul(-­‐Aspect  Summariza(on  Content  Model:  Sentence-­‐Level  HMM   {   ... chicken, parm, ordered, loved, ... }  Task:  Token-­‐Level  condi(onal  random  field   Ordered chicken parm and loved it 20  
  • 21. Augmen(ng  CRF  with  Topics   ...   topic  3   Add  context   ...   features   21  
  • 22. Joint  Learning  Objec(ve   {  Content  and  task  params.   22  
  • 23. Joint  Learning  E-­‐Step:  Can  be  computed  using  Forward-­‐Backward  algorithm   23  
  • 24. Joint  Learning  M-­‐Step:   For        :  Standard  normaliza(on  of    T counts  from  E-­‐Step.   For        :     weighted  condi(onal  likelihood  objec(ve   24  
  • 25. Supervised  Objec(ve   {   Labeled  data  for  content   and  task  parameters   25  
  • 26. Semi-­‐Supervised  Objec(ve   {   {   Labeled  data  for   Unlabeled  data   content  and  task   for  content     parameters   parameters   26  
  • 27. Data  set  •  Amazon  TV  reviews   –  Train:  35  reviews   –  Test:  24  reviews   –  Unlabeled:  12,600  reviews  •  Yelp  restaurant  reviews   –  Train:  48  reviews   –  Test:  48  reviews   –  Unlabeled:  33,000  reviews   27  
  • 28. Informa(on  Extrac(on  Goal:  Extract  phrases  from  review  text  in  pre-­‐specified  categories  Input:  User-­‐generated  review  text,  labeled  training  data            Output:  Labeled  phrases  in  each  category     FOOD   I  came  here  with  my  husband  for  the  tas(ng  menu,  and  we     SERVICE   were  not  disappointed.    We  got  to  sit  at  the  chef’s  table,  which   ATMOSPHERE   overlooked  the  kitchen.    The  service  was  polite  and   PRICE   knowledgeable,  the  atmosphere  was  elegant  and  energePc   OVERALL   and  the  food  was  wonderfully  creaPve  and  delicious.   28  
  • 29. Systems  •  NoCM:  Just  the  CRF,  no  content  model  •  IndepCM:  Es(mate  content  model  parameters   first,  then  use  them  in  the  CRF.  •  JointCM:  Es(mate  content  and  CRF   parameters  jointly  using  EM   29  
  • 30. Results  Token  F-­‐measure  Evalua(on   30  
  • 31. Impact  of  Unlabeled  Data  Setup:  Using  the  Amazon  corpus,  fix  the  amount  of                            labeled  data,  vary  the  amount  of  unlabeled  data   50   47.3   47.8   44   41,5   38   0   6  300   12  600   Number  of  Unlabeled  Reviews   31  
  • 32. Mul(-­‐Aspect  Sen(ment  Ranking  Task:  Predict  sen(ment  (1-­‐10)  for  each  aspect     Aspect Rating picture 9.0 audio 9.5   extra 7.0 Approach:  •  Same  objec(ve  as  summariza(on  •  Different  E-­‐  and  M-­‐Steps  [See  paper]   32  
  • 33. Mul(-­‐Aspect  Sen(ment  Ranking  DVD  Review  Domain  L2  Error:  Lower  is  beler   33  
  • 34. Paper  &  Code  •  Paper:   hlp://groups.csail.mit.edu/rbg/code/content_structure/sauper-­‐emnlp-­‐10.pdf  •  Code:   hlp://groups.csail.mit.edu/rbg/code/content_structure/code.tar.gz  •  Data:   hlp://groups.csail.mit.edu/rbg/code/content_structure/data.tgz   34  
  • 35. Agree  to  Disagree  #1   The  fried  oysters  were  very  good   The  casish  tasted  dry  and  bland  and  boring   The  star  of  the  plate  was  the  grits  #2   The  gnocchi  with  mushrooms  was  outstanding   The  casish  approaches  perfec(on   The  shrimp  and  grits  are  nothing  less  than  spectacular   35  
  • 36. Review  Aggrega(on  •  Hundreds  of  reviews  for  each  product  •  Opinions  vary  widely           → Need  to  aggregate  sta(s(cs  •  Histograms  show  sen(ment  distribu(on,  but  it’s  not  enough   36  
  • 37. Aspect-­‐based  Analysis  Prior  work:   Use  a  set  of  predefined  domain-­‐specific  product  aspects   (e.g.,  Snyder  and  Barzilay  2007)                  →  Coarse  level  analysis   37  
  • 38. Informa(ve  Aggrega(on  Useful  informa(on:   –  What’s  the  best  dish  at  this  restaurant?   –  What  do  people  dislike  about  this  restaurant?   –  Which  dishes  do  people  disagree  about?   38  
  • 39. Informa(ve  Aggrega(on  Aggrega(on  of  product-­‐specific  aspects   Japanese  Restaurant   We  had  a  great  Pme  last   Wow,   I   can’t   believe   I  have  such  mixed  things   night   at   this   restaurant.     how  much  this  place  has   t o   s a y   a b o u t   t h i s   T h e   s u s h i   w a s   s o   changed!     They   used   to   restaurant.     On   one   incredibly  fresh.    We  had   be   mediocre,   but   now   hand,   their   sushi   is   a   bad   experience   at   the   they  never  fail  to  amaze.     unquesPonably   the   best   b a r ,   t h o u g h .     M y   We   started   off   at   the   bar   in  the  city.    On  the  other,   chocolate   marPni   was   with   awesome   sake   the   atmosphere   isn’t   absolutely  terrible.      We   bombs.    When  we  got  to   that   great.     Plus,   their   will   be   back,   but   we’ll   our   table,   the   sushi   was   drinks   are   completely   skip  the  drinks.   fantasPc.     watered  down.   Sushi   100%  posiPve   Chicken   33%  posiPve   Relevant  aspects   User  sen(ment   39  
  • 40. Corpus-­‐driven  Aspect  Defini(on  Define  aspects  dynamically  based  on  reviews   Japanese  Restaurant   Bakery   We   had   a   great   Pme   Wow,   I   can’t   believe   I   have   such   mixed   We   had   a   great   Pme   Wow,   I   can’t   believe   I   have   such   mixed   l a s t   n i g h t   a t   t h i s   how   much   this   place   things  to  say  about  this   l a s t   n i g h t   a t   t h i s   how   much   this   place   things  to  say  about  this   restaurant.     The   sushi   has   changed!     They   restaurant.     On   one   restaurant.     The   sushi   has   changed!     They   restaurant.     On   one   was  so  incredibly  fresh.     used   to   be   mediocre,   hand,   their   sushi   is   was  so  incredibly  fresh.     used   to   be   mediocre,   hand,   their   sushi   is   W e   h a d   a   b a d   but  now  they  never  fail   unquesPonably   the   W e   h a d   a   b a d   but  now  they  never  fail   unquesPonably   the   experience   at   the   bar,   to   amaze.     We   started   best  in  the  city.    On  the   experience   at   the   bar,   to   amaze.     We   started   best  in  the  city.    On  the   though.     My   chocolate   off   at   the   bar   with   other,   the   atmosphere   though.     My   chocolate   off   at   the   bar   with   other,   the   atmosphere   marPni   was   absolutely   awesome   sake   bombs.     isn’t   that   great.     Plus,   marPni   was   absolutely   awesome   sake   bombs.     isn’t   that   great.     Plus,   terrible.       We   will   be   When   we   got   to   our   t h e i r   d r i n k s   a r e   terrible.       We   will   be   When   we   got   to   our   t h e i r   d r i n k s   a r e   back,   but   we’ll   skip   the   table,   the   sushi   was   completely   watered   back,   but   we’ll   skip   the   table,   the   sushi   was   completely   watered   drinks.   fantasPc.     down.   drinks.   fantasPc.     down.   -­‐  Sushi   -­‐  Cookies   -­‐  Sake   -­‐  Cakes   -­‐  Dessert   -­‐  Pies   →  Aspects  specific  to  each  product   40  
  • 41. Corpus-­‐driven  Aspect  Defini(on  Allows  comparison  across  mul(ple  reviews   Bakery   I   buy   all   of   my   baked   I   picked   up   a   birthday   This   place   is   nice   for   g o o d s   f r o m   t h i s   cake   for   my   son   here   some   baked   goods,   bakery.    Their  bread  is   yesterday.     It   was   the   but   some   things   are   so   delicious!     It’s   also   most   amazing   cake   really   nasty.     The   loaf   good   for   all   kinds   of   I’ve   ever   seen!     The   of   bread   I   bought   was   baked   goods.     They   d e c o r a P o n s   w e r e   stale!     They   were   also   have   some   truly   outstanding,   and   all   happy   to   take   it   back   beauPful   cakes   on   the   kids   loved   the   and   give   me   another,   display.     Even   their   chocolate   icing.     I’ll   but   I’ll   be   watching   cookies  are  great!   definitely  come  back!   next  Pme.     …truly  beauPful  cakes  on  display.   …most  amazing  cake  I’ve  ever  seen!     –  Consensus  (both  posi(ve  and  nega(ve)   What’s  the  best/worst  aspect  of  this  product?   41  
  • 42. Corpus-­‐driven  Aspect  Defini(on  Allows  comparison  across  mul(ple  reviews   Bakery   I   buy   all   of   my   baked   I   picked   up   a   birthday   This   place   is   nice   for   g o o d s   f r o m   t h i s   cake   for   my   son   here   some   baked   goods,   bakery.    Their  bread  is   yesterday.     It   was   the   but   some   things   are   so   delicious!     It’s   also   most   amazing   cake   really   nasty.     The   loaf   good   for   all   kinds   of   I’ve   ever   seen!     The   of   bread   I   bought   was   baked   goods.     They   d e c o r a P o n s   w e r e   stale!     They   were   also   have   some   truly   outstanding,   and   all   happy   to   take   it   back   beauPful   cakes   on   the   kids   loved   the   and   give   me   another,   display.     Even   their   chocolate   icing.     I’ll   but   I’ll   be   watching   cookies  are  great!   definitely  come  back!   next  Pme.     Their  bread  is  so  delicious!   The  loaf  of  bread  I  bought  was  stale!   –  Consensus  (both  posi(ve  and  nega(ve)   What’s  the  best/worst  aspect  of  this  product?   –  Conflicts  of  opinion   What  aspects  do  people  disagree  about?   42  
  • 43. Task:  Input  Input:     –  Food-­‐related  snippets  from  restaurant  reviews   •  Concise  descrip(on  of  a  user’s  opinion   –  Automa(cally  extracted  from  full  review  text  (Sauper  et  al.  2010)   We  went  to  the  restaurant,  and  the  sushi  was  incredibly  fresh.   –  Segmented  by  restaurant,  but  no  addi(onal  annota(on   Japanese  Restaurant   Bakery     the  sushi  was  so  incredibly  fresh   I’d  recommend  the  apple  pie     best  chicken  katsu  in  town   the  bread  was  disappoinPngly  stale   drinks  are  fun,  fresh,  and  delicious   chocolate  torte  is  the  stuff  of  dreams   43  
  • 44. Task:  Output  Output:   –  Relevant  aspects  for  each  restaurant   –  Aspect  label  for  each  snippet   –  Sen(ment  label  for  each  snippet   Mexican  Restaurant   Burrito   Salsa   +  they  had  a  decent  burrito   +  the  salsa  is  incredible   −  the  burrito  was  mediocre  at  best   +  the  mango  salsa  is  perfectly  diced   −  the  burrito  was  heavily  cilantroed   +  hola  free  chips  &  salsa   44  
  • 45. Possible  Solu(on  Use  clustering  based  on  lexical  similarity   the  marPnis  were  very  good   the  sushi  was  the  best  I’d  ever  had   the  marPnis  were  tasty   best  paella  I’d  ever  had   the  fillet  was  the  best  steak  we’d  ever  had   the  wine  list  was  pricey   it’s  the  best  soup  I’ve  ever  had   their  wine  selec(on  is  horrible   ParPal  output  of  state-­‐of-­‐the-­‐art  clustering  system  Problem:    Clusters  and  aspects  are  not  aligned!   45  
  • 46. Our  Solu(on  •  Jointly  model  aspect  and  sen(ment  •  Leverage  data  to  dis(nguish  sen(ment  and  aspect   Bakery   Japanese   Review  1   pies   delicious   salmon   fantas(c   cookies   fresh   sake   smooth   Review  2   cakes   fantas(c   maki   beau(ful   pies   amazing   salmon   fresh   Review  3   cakes   beau(ful   maki   delicious   bread   stale   miso   bland   46  
  • 47. Model:  Overview  •  Each  snippet  has  an  aspect  and  a  sen(ment  •  Each  word  is  drawn  from  a  topic  distribu(on:   –  Aspects  are  specific  to  a  single  product   pizza   dessert   pad  thai   –  Sen(ment  is  global  across  all  products   great   horrible   amazing   –  Background  distribu(on  is  global   was   our   food  •  Transi(on  distribu(on  encodes  word  topic   transi(ons   They  had  wonderful  appePzers.   47  
  • 48. Model:  Genera(ve  Story  1.  Global  distribu(ons  2.  Restaurant-­‐level  distribu(ons  3.  Snippet-­‐level  latent  structure  4.  Words   48  
  • 49. Model:  Genera(ve  Story  Globally,   a.  Background  distribu(on     word  distribu(on  for  stop  words  and  in-­‐domain  white  noise   b.  Sen(ment  distribu(ons              ,   word  distribu(ons  over  posi(ve  and  nega(ve  sen(ment  words   small  bias  for  seed  words   c.  Transi(on  distribu(on   first-­‐order  Markov  distribu(on  of  word  topic  transi(ons   Background     Sen(ment   Transi(on   distribu(on   distribu(ons   distribu(on   B   +   -­‐   Λ   49  
  • 50. Model:  Genera(ve  Story  For  each  restaurant      ,   a.  Aspect  distribu(ons   word  distribu(on  for  each  aspect   b.  Aspect-­‐sen(ment  binomials   probability  of  posi(ve  vs.  nega(ve  sen(ment  for  each  aspect   c.  Aspect  mul(nomial   probability  of  each  aspect   Aspect   Aspect  distribu(ons   Aspect-­‐sen(ment  binomials   mul(nomial   1   2   … K   φ1   φ2   … φK   ψ 50  
  • 51. Model:  Genera(ve  Story  For  each  snippet          from  restaurant      ,   Aspect   a.  Aspect   chosen  from  aspect  mul(nomial     ψ 2   Sen(ment   b.  Sen(ment   chosen  from  aspect-­‐sen(ment  binomial     φ2   +   c.  Sequence  of  word  topics   Background,  Aspect,  or  Sen(ment   selected  from  transi(on  distribu(on     Word  topic  sequence   Λ   B   A   B   S   S   51  
  • 52. Model:  Genera(ve  Story  For  each  word          ,   Aspect   a.  Word   chosen  from  topic-­‐specific  distribu(on   2   based  on  word  topic  sequence   Sen(ment   Word  topic  sequence   +   B   A   B   S   S   Background   B   The   pizza   was   really   great   52  
  • 53. Standard  Varia(onal  Inference  •  Desired  posterior:   Observed  data   Model  parameters   Latent  structure   53  
  • 54. Standard  Varia(onal  Inference  •  Desired  posterior:  •  Op(mizing  directly  is  intractable  •  Instead,  op(mize  varia(onal  objec(ve  with   mean-­‐field  factoriza(on:   s.t.                    factorizes     54  
  • 55. Data  Set  Food-­‐related  snippets  from  Yelp  restaurant  reviews   (Sauper  et  al.  2010)   –  13,879  total  snippets   –  328  restaurants   –  42.1  snippets  per  restaurant  (high  variance)   –  7.8  words  per  snippet    Seed  words  for  sen(ment  distribu(ons   –  42  posi(ve,  33  nega(ve   –  Relevant  to  domain  (e.g.,  “delicious”)   55  
  • 56. Experiments:  Aspect  Clustering  •  Gold  standard   –  Clusters  over  3,250  snippets   –  Collected  via  Mechanical  Turk  •  Baseline   –  CLUTO  clustering  weighted  by  TF*IDF  •  MUC  cluster  evalua(on  metric   –  Based  on  number  of  cluster  merges  and  splits   required  to  achieve  gold  data  •  Both  systems  allowed  10  clusters  per  restaurant   56  
  • 57. Experiments:  Aspect  Clustering   MUC  F1   80   75,5   69,3   70   60   Baseline   Our  model  Our  model   Our  model   the  marPnis  are  very  good   the  carrot  cake  was  delicious   the  marPni  selec(on  looked  delicious   the  best  carrot  cake  I’ve  ever  eaten   the  s’mores  marPni  sounded  excellent   carrot  cake  was  deliciously  moist   the  marPnis  are  very  good   the  carrot  cake  was  delicious   the  mozzarella  was  very  fresh   it  was  rich,  creamy,  and  delicious   the  fish  and  various  meets  were  well  made   the  pasta  bolognese  was  rich  and  robust  Baseline   Baseline   57  
  • 58. Error  Analysis  Number  of  sen(ment  and  aspect  errors  approximately  equal  Aspect  errors   Sen(ment  errors  −  Similar  aspect  words  in  different   −  Rare  sen(ment  words   contexts   belgian  frites  are  very  crave-­‐able   the  blackened  chicken  was  meh   chicken  enchiladas  are  yummy   −  Nega(on,  some(mes   the  cream  cheese  wasn’t  bad   the  cream  cheese  was  n’t  bad   ice  cream  was  just  delicious   58  
  • 59. Paper  &  Code  •  Paper   hlp://groups.csail.mit.edu/rbg/code/content_a†tude/sauper-­‐acl-­‐11.pdf  •  Code   hlp://groups.csail.mit.edu/rbg/code/content_a†tude/code.tar.gz   59  
  • 60. The  Task  •  Goal:  Automa(c  construc(on  of  even  records    from    Twiler  •  Input:  Stream  of  Twiler  messages   Seated  at  @carnegiehall  waing  for  @CraigyFerg’s  show   @DJPaulyD  absolutely  killed  it  at  Terminal  5  last  night.   Craig,  nice  seeing  you  #noelnight  this  weekend  @becksdavis!  •  Output:  Table  of  event  records   Ar#st   Venue   Craig  Ferguson   Carnegie  Hall   DJ  Pauly  D   Terminal  5   60  
  • 61. Example  Output   Artist Venue Bardavon Opera Amos Lee HouseJim Gaffigan Best Buy Theater Jeff Tweedy Bowery BallroomHall & Oates Beacon Theater J. Cole Highline BallroomSunday Gospel B.B. King Blues Brunch Club 61  
  • 62. IE  for  Social  Media:  Challenges  •  Messages  are  short   ⇒  Individual  message  may  not  contain  all  event  fields.  •  Message  are  expressed  in  colloquial  language   ⇒  Mapping  between  messages  and  event  record  is  not   obvious   Seated  at  @carnegiehall  wai(ng   for  @CraigyFerg’s  show   Ar(st:  Craig  Ferguson   RT  @leerader  :  ge†ng  REALLY   Venue:  Carnegie  Hall   stoked  for  #CraigyAtCarnegie   sat  night.   62  
  • 63. IE  for  Social  Media:  Opportunity  Significant  redundancy  in  Twiler  stream:   Seated  at  @carnegiehall  waing  for  @CraigyFerg’s  show   @DJPaulyD  absolutely  killed  it  at  Terminal  5  last  night.   Craig,  nice  seeing  you  #noelnight  this  weekend  @becksdavis!  Approach:    Drive  event  extrac(on  by  modeling   agreement  in  message  stream.   63  
  • 64. Model  Func(onality  •  Message  level  analysis:  Tag  words  in  message  with    event-­‐field  labels.   Label  (y) arst   none   venue   venue   @YonderMountain      rocking      Mercury      Lounge   Message  (x) 64  
  • 65. Model  Func(onality  •  Message  level  analysis:  Tag  words  in  message  with    event-­‐field  labels.  •  Message  clustering:  Group  messages  based  on  events.  •  Event  records:                                  Induce  canonical  value  for  each  field.   Record   (R) Alignment  (A) #CraigAtCarnie    is  starng  now!  #iamsoexcited   Ar#st   Venue   Craig  Ferguson   Carnegie  Hall   Going  to  see  Radiohead  at  the  Coliseum  tonight!   Craig  Ferguson,  what  a  riot!  Carnegie  is  in  stches   ArPst   Venue   Radiohead   Coliseum   Pumped  for  R  A  D  I  O  H  E  A  D  !!!   65  
  • 66. Model  Overview  Source  of  supervision:    Example  event  records    -­‐  Alignment  between  records  and  messages  not  observed.    -­‐  Message  level  field  annota(ons  not  observed.   July  16,  5:30pm  at  American  Folk  Art  Museum   Jun  17,  8:00  PM  at  Izod  Center   Jun  17,  8:00  PM  at  Tarrytown  Music  Hall   66  
  • 67. from 169 distinct names).3types; 304 distinct tokens only edge feature is label-to-label. and and venue The “trends,” it quickly dominates the conversation on- only edge feature is label-to-label. bag of words observed Model  Overview   line. As a result some events may have only names)21,475 ba Wikipedia; 4.2 Record Uniquenessa few a 4.2 Record Uniqueness Factor tinct Factor or di referent messages while other more popular events nam York City venue One challenge with Twitter istokens fr 304 distinct the s One challenge with Twitter is the so-calledfeature is chamber effect: when a edge echo •  (y)  Message  level  athousands or more.a topicsuch a only topic becom may have chamber nalysis   effect: when In becomes circum-or la popular, “trends,” it quickly dominates the con “trends,” it quickly line. jAs event conversation on- Learn   ointly  in  may collect dominates the   4.2 Record Unique •  stance, the messages for a popular a result some events may ha (A)  Message  clustering   One challenge with line. As a result some events may have onlyeffect: when factor  gother morewhile itaquickly d to form multiple identical recordraph  model   otherwe p referent clusters. Since more messages chamber few •  (R)  Event  records   referent messages whilehave thousands “trends,” aeventssu may popular or more. In line. aAs result som may have thousands or more. In such circum- 2 e.g.: xxx, XXX, Xxx, or other for a the messages for a popular eve stance, referent messages wh stance, the messages form popular event may collect 3 to multiple identicalhave thousands may record clust These are to form multiple identical record clusters. Since we f just features, not a filter; we are freestance, the messages to extractP (R, A, y|x) or venue regardless of their inclusion in this list. any artist 2 e.g.: xxx, XXX, Xxx, to form multiple iden 3 or other 2 e.g.: xxx, XXX, Xxx, or other just features, not axxx, XXX, Xxx, These are 2 e.g.: filter; we a 3 These are just features, not or venue regardless These are just feature any artist a filter; we are free to extract 3of their inclusion(            )            )(                )(                )   (       any artist or venue regardless of their inclusion any this or venue regardl in artist list. Sequence     Record   Term   Record   Labeling   Uniqueness   Popularity   Consistency   67  
  • 68. to form multiple identical record clusters. Since we th Sequence  Labeling  Factor   2 e.g.: xxx, XXX, Xxx, or other th 3 These are just features, not a filter; we are free to extract le any artist or venue regardless of their inclusion in this list. re T ⇥SEQ (x, y) = exp{ SEQ fSEQ (x, y)} arst   none   venue   venue   @YonderMountain      rocking      Mercury      Lounge  •  Similar  to  chain  CRF   IsWikipediaMatch•  Features  on  token  and  label   word+1=“rocking” –  Wikipedia  match,  context,  etc.     IsUserMention …. 68  
  • 69. stance, the messages for a popular event may collect cause speech on T Term  Popularity  Factor   to form multiple identical record clusters. Since we 2 these clusters to b the canonical rec e.g.: xxx, XXX, Xxx, or other 3 These are just features, not a filter; we are free to extract learned. The ⇥P O any artist or venue regardless of their inclusion in this list. resenting a lenie P OP (x, y, RA = v) = X max Sim(xj , y j , v k ) k j•  Match  each  labeled   arst   arst   venue   venue          message  token  to  best          record  value  token        Dave        MaWhews                at                        Slims  •  Token  matching          is  IDF-­‐weighted     Ar#st   Venue     Dave  MaWhews  Band   Slims   69    
  • 70. referent messages while other more popular events The term Record  Uniqueness  Factor   may have thousands or more. In such a circum- factors th stance, the messages for a popular event may collect cause spe to form multiple identical record clusters. Since we these clu the cano 2 3 Y e.g.: xxx, XXX, Xxx, or other These are just features, not a filter; we are free to extract learned. UNQ (R ) = UNQ (R , R ) any artist or venue regardless of their inclusion in this list. k 0 k resenting k6=k0 0 UNQ (Rk , Rk0 ) = exp{ Sim(Rk , Rk )}•  Discourage  similar  record  values   Ar#st   ArPst   Yonder  Mountain  Band   Yonder  Mountain   70  
  • 71. stance, the messages for a popular event may collect cause speech on Twitte to form multiple identical record clusters. Since we these clusters to be am Record  Consistency  Factor   2 e.g.: xxx, XXX, Xxx, or other the canonical record p 3 These are just features, not a filter; we are free to extract learned. The ⇥P OP fac any artist or venue regardless of their inclusion in this list. resenting a lenient co CON (x, y, RA ) = I[ P OP (x, y, RA ) > 0, 8⇥] arst   arst   venue   venue  •  Encourage  all   record  values  to  be   in  single  message        Dave        MaWhews                at                        Slims  •  Ac(ve  when  there   is  some  match  for   Ar#st   Venue   all  record  fields   Dave  MaWhews  Band   Slims       71  
  • 72. Inference  •  Varia(onal  mean-­‐field  inference     to  approximate  posterior   P (R, A, y|x) Q(R, A, y) K Y ! n ! Y Y = q(Rk ) q(Ai )q(yi ) k=1 i=1 72  
  • 73. Experiments:  Dataset  Twiler  data:    Three  weekends  of  filtered  messages:   •  Authors  from  New  York,     •  Concert  related  messages  (MIRA  based  classifier)    Resul(ng  dataset:    5,800  messages   •  Training    –  2,184  messages  (one  weekend)   •  Test  –  3,662  messages  (two  weekends)  Gold  event  records:   •  New  York  city  events  from  NYC.com   •  11  events  in  training,  31  events  in  test.     73  
  • 74. Experiment:  Baselines  Vo(ng  methodology  of  Mann  and  Yarowsky  (2005):   •  Aggregate  output  of  baseline  IE  predic(ons  of   each  message.   •  Select  top  K  events  based  on  number  of  votes  Baseline  IE  predictors.   •  List  baseline:    String  overlap  with  given  list  of   ar(sts  and  venues  (Wikipedia)   •  CRF  Vo(ng  baseline:  Extract  record  for  each   labeled  pair  of  fields   •  CRF  Low-­‐Threshold:    CRF  vo(ng  but  extract     records  with  lower  extrac(on  threshold   74  
  • 75. Precision   0,9  Precision  (Manual  Evelua(on)   0,8   0,7   0,6   0,5   0,4   0,3   0,2   10   20   30   40   50   Number  of  Records  Kept  Low  Thresh   CRF   List   Our  Work   Our  Work  +  Con   75  
  • 76. Recall   0,7  Recall  against  Gold  Event  Records   0,65   0,6   0,55   0,5   0,45   0,4   0,35   0,3   0,25   0,2   1,00   1,5   2   2,5   3   3,5   4   4,5   5   k,  as  a  mul(ple  of  the  number  of  gold  records   Low  Thresh   CRF   List   Our  Work   76  
  • 77. Paper  &  Code  •  Paper   hlp://people.csail.mit.edu/regina/my_papers/twiler_acl2011.pdf  •  Code   hlp://groups.csail.mit.edu/rbg/code/twiler   77  
  • 78. Conclusion  •  Social  media    presents  unique  challenges  and   opportuni(es  for  NLP  technologies    •  Linguis(cally-­‐rich  models    can  compensate  for  noise   inherent  in    social  media  streams      •  Joint  modeling  of  rich  linguis(c  rela(ons  boosts   predic(on  accuracy   78