Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio


Published on

Sentiment analysis is a topic that has been looked at repeatedly in the social media monitoring sphere, with some strong advocates of automatic sentiment analysis and other strong advocates of human analysis. Synthesio, that traditionally relies on human analysts but has the capacity to incorporate an automatic analysis, took a look at the pros and cons of text analytics and interviewed Seth Grimes, an expert in the field.

Published in: Technology
  • Be the first to comment

The Truth about Sentiment & Natural Language Processing (NLP) by Synthesio

  1. 1.                          The  Truth  About  Sentiment  &  Natural  Language  Processing    By  Synthesio                     Summary   Introduction  .2   Artificial  Intelligence’s  difficulties  with  sentiment  .3   Human  analysis  is  an  obligatory  step  when  analyzing  web  content  .5   Current  technological  advances  .5   The  future  of  semantic  technology  .8   .7.Conclusion  .10      Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   1    
  2. 2.  Introduction    The   web   has   made   it   possible   for   brands   to   discover   what  people   are   saying   about   their   brands   online,   either   in   mainstream   media   like   online  newspapers  and  magazines,  or  on  social  media.  Consumers  now  search  for  opinions  online  before,  during,  and  after  a  purchase.  The  next  step  for  brands  is  finding  out  whether  people  are   talking   positively   or   negatively   about   their   brand,   and   why.   Some   online   ratings   provide  a  number  but  not  the  reasoning  behind  it,  and  may  only  present  half  of  the  story.  Numerous  companies  have  been  working  on  text  mining  for  close  to  30  years  in  some  cases,  thus   sentiment   analysis   is   not   a   new   area   but   it   has   become   a   hot   topic   thanks   to   social  media.   Social   media   monitoring   companies,   as   well   as   PR   practitioners,   and   digital  marketers   in   general,   have   waged   debate   over   whether   sentiment   should   be   analyzed   by  man   or   machine.   Synthesio   currently   uses   human   analysts   for   sentiment   analysis   but   can  add  natural  language  processing  capacities  on  a  case-­‐by-­‐case  basis.  Although  technology  is  quickly  advancing  to  catch  up  on  its  lag  behind  human  analysis,  as  we  advance  toward  what  is  referred  to  as  the  singularity,  it  seems  as  though  the  best  option  is  currently  combining  both  machine  and  man.                                        Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   2    
  3. 3.  Artificial  Intelligence’s  difficulty  with  sentiment    One  way  that  researchers  have  attempted  to  classify  sentiment  is  by  creating  a  “sentiment  lexicon”      Sentiment  is  not  analyzed  via  artificial  intelligence,  as  some  people  may  be  tempted  to  think.  Rather,  it  is  analyzed  via  a  systematic  process  that  involves  the  use  of  a  sentiment  lexicon.  This  lexicon  assigns  a  degree  of  positivity  or  negativity  to  a  word  by  itself  that  is  then  used  to  give  meaning  to  the  entirety  of  the  article.  This  is  a  way  of  analyzing  sentiment,  then,  by  considering   a   type   of   inherent   positivity   or   negativity   of   each   word   that   would   be   used   by   someone   to   talk   about   your  business  or  products.  For  example,  “happy”  would  be  deemed  a  positive  word,  as  well  as  “like”  and  “love”.  At  the  opposite  end  of  the  spectrum  we  can  see  words  like  “hate”,  “dislike”,  etc.      There  are  two  problems  with  this  methodology,  however.  The  first  problem  is  that  this  assigning  of  positive  and  negative  sentiment  evaluates  a  word  without  the  context  of   what   is   around  it.  The  dictionary   is   extremely   limited   in   the   number   of  words  that  will  always  attach  a  positive  or  negative  sentiment  to  an  expression.  The  second  problem  is  that  researchers  may  assign  different  degrees  of  positivity  to  a  word.  Particularly  in  the  case  of  ambiguous  expressions,  a  researcher  may  be  more  inclined  to  note  a  word  as  more  or  less  positive.      Text  categorization  classifies  articles  by  topic1    Text  categorization  does  not  classically  look  at  the  various  features  mentioned  within  one  article.  Sentiment  analysis  has  traditionally   been   performed   using   technology   that   evaluates   an   article   at   a   global   level.   Within   one   text,   however,   the  topic  may  not  be  linked  to  the  descriptors.  For  example,  take  the  sentence:  “This  film  should  be  brilliant.  It  sounds  like  a  great  plot,  the  actors  are  first  grade,  and  the  supporting  cast  is  good  as  well,  and  Stallone  is  attempting  to  deliver  a  good  performance.  However,  it  can’t  hold  up.”  The  sentence  should  be  positive,  given  the  number  of  positive  descriptors.  It  is  only  at  the  end  that  a  human  can  identify  the  finality  of  the  judgment  that  is  overall  negative.      The   dictionaries   used   are   developed   through   analysis   of   various   factors,   including   sentiment   polarity   and   degrees   of    positivity (“like”   vs   “dislike”;   relatedness   of   topics..),   identifying   which   parts   of   a   document   contain   subjective   content  (subjectivity  detection  and  opinion  Identification),  identifying  which  parts  of  a  document  regard  the  same  subject  before  analyzing  (joint  topic-­‐sentiment  analysis),  and  determining  the  political  orientation  of  a  text  (viewpoints  and  perspectives).    Other  non-­‐factual  information  in  the  text  can  also  be  taken  into  account.  For  example,  there  are  six  “universal”  emotions:   2anger,   disgust,   fear,   happiness,   sadness,   and   surprise   that   may   be   analyzed,   as   well   as   term   presence,   term   frequency,  syntax,  and  negation.        The  majority  of  sentiment  analysis  literature  has  focused  on  text  written  in  English      This  means  that  for  the  time  being,  most  of  the  resources  that  have  been  developed  for  automatic  sentiment  analysis  have  been  developed  in  English  and  for  the  English  language.  We  looked  at  this  with  Seth  Grimes,  a  text  analytics  expert,  later  in  this  document  in  an  exclusive  interview,  but  there  have  traditionally  been  two  types  of  solutions.  One   solution   for   multilingual   resources   has   been   using   bilingual   dictionaries   to   transfer   the   corpus,   meaning   finding  parallels  for  all  of  the  rules  that  were  applied  to  the  English  texts.  A  second  solution  has  been  to  apply  sentiment  analysis  to  a  translated  version  of  the  text,  but  accuracy  rates  may  be  questionable.                                                                                                                        1  Opinion  mining  and  sentiment  analysis,  2008  2  Idem  Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   3    
  4. 4.  Seth  Grimes,  expert  in  NLP     There  are  companies  that  propose  sentiment  analysis  in  one  language  (typically  English)  while  others   propose  an  analysis  in  10  different  languages.  Linguistic  approaches  (lexicons  and  dictionaries)  may  be   used   for   several   languages,   but   they   have   incomplete   sentiment   capabilities   in   most   of   them.   Translating  linguistic  content  in  French  or  Chinese,  fore  example,  can’t  possibly  offer  the  best  results.                                          Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   4    
  5. 5.  Human  analysis  is  an  obligatory  step  when  analyzing  web  content    Machines  are  capable  of  deciphering  meaning  from  large  amounts  of  information    An  advantage  of  having  an  automation  of  text  analysis  is  that  computers  are  able  to  work  on  large  pieces  of  text  that  are  homogenous  in  form  and  written  in  one  language  much  more  quickly  than  a  human  ever  could.  Much  as  in  the  same  way  that  macros  in  Excel  accelerate  the  speed  at  which  a  human  may  advance,  having  algorithms  treat  information  can  accelerate  sentiment  analysis.  The  text  must  be  written  using  a  specific  vocabulary,  however,  with  very  little  variability,  in  order  to  obtain  high  levels  of  accuracy.    Collocations  and  complex  syntactic  patterns  have  been  found  to  be  useful  in  detecting  subjectivity3    Some  technology  experts  have  attempted  to  create  syntactic  relations  within  feature  sets  that  are  then  tested  on  text  corpuses  to  “train”  the  software  and  allow  for  the  detection  of  subjective  expressions.  This  is  done  by  creating  syntactic  templates  that  are  run  through  a  training  corpus,  generating  extraction  patterns  for  every  time  the  templates  appear.  For  example,  <x>  pleased  me  should  match  any  time  the  word  “pleased”  is  present.  There  are  certain  limitations  to  this  technique,  as  the  software  will  then  search  for  specific  syntactic  expressions  and  not  exact  word  sentences.  When  analyzing  for  sentiment,  then,  this  is  only  the  first  step  in  identifying  if  there  is  sentiment  present  at  all.      Online  reviews  have  had  the  most  success  with  NLP  online    “Opinion-­‐oriented   information   extraction”   is   advancing   in   identifying   subjects   in   a   text   and   their   relationship   with   the   4words  around  them  that  give  them  their  context.    Nouns  in  online  reviews  are  particular  in  that  they  most  likely  –  but  not  always   –   pertain   to   the   product   or   service   being   reviewed.   The   context   is   similarly   most   likely   –   but   not   always   –   the  reviewer’s  opinion  of  such  product  or  service.  Whereas   other   online   media,   like   blog   posts,   may   post   various   opinions   throughout   one   post,   with   both   positive   and  negative  sentiment  attached  accordingly,  online  reviews  are  one  type  of  media  that  is  typically  focused  on  uniquely  one  subject.   A   heuristic   for   NLP   software   has   been   to   detect   adjectives   that   are   in   the   same   sentence   as   the  feature/product/service  being  evaluated.  These  can  then  be  analyzed  by  manual  or  semi-­‐manual  rules  or  lexicons.        Specialist  in  PR  Relations  KD  Paine  explains:      “Computers   can   do   a   lot   of   things   well,   but   differentiating   between   positive   and   negative   comments   in   consumer   generated   media   isn’t   one   of   them.   The   problem   with   consumer   generated   media   is   that   it   is   filled  with  irony,  sarcasm  and  non-­‐traditional  ways  of  expressing  sentiment.  That’s  why  we  recommend  a   hybrid  solution.  Let  computers  do  the  heavy  lifting,  and  let  humans  provide  the  judgment.”  –KD  Paine                                                                                                                          3  Learning  Extraction  Patterns  for  Subjective  Expression    4  Opinion  mining  and  sentiment  analysis,  2008  Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   5    
  6. 6.  Current  technological  advances    Technology  is  continually  progressing    Mao  and  Lebanon  are  two  researchers  who  proposed  using  “isotonic  conditional  random  fields”  to  analyze  sentiment  at   5sentence  level.  They  created  mathematical  calculations  to  determine  sentiment,  given  that  certain  words  may  be  strongly  positive   or   negative   and   thus   affect   the   “local   sentiment”   positively   or   negatively.   These   could   be   new   models   for  programming   machines   to   determine   sentiment   within   certain   probabilities   by   also   incorporating   the   author   into   the  equation.  Uses  like  these  are  interesting  because  human  reviewers  do  not  always  agree,  either.      “I,  for  one,  welcome  our  new  computer  overlords”  –  Ken  Jennings,  Jeopardy  contestant       Watson,   a   question-­‐and-­‐answer   computer   developed   by   IBM,   made   history   on   Jeopardy   this   year,   an   American   game   show   renowned  for  its  difficult  question-­‐and-­‐answer  format,  by  making   an   appearance   against   two   top   champions.   Contestants   typically   study   volumes   of   encyclopedias   in   order   to   arrive   at   the   final   round,   but   IBM   put   their   supercomputer   Watson   to   the   test  –   and   he  won.     Programmed  not  only  to  buzz  in  according  to  the  level  of  certainty   he  had  for  each  question,  Watson  was  trained  to  answer  in  the  form  of  a  question  and  decipher  the  complex  language   that   goes   into   a   game   of   Jeopardy.   The   category   names   are   often   puns,   as   well   as   the   “answers”   (which   serve   as   6 questions).    IBM  proved  that  their  technology  has  advanced  to  the  point  where  it  can  intelligently  parse  language  and   weigh  different  parts  of  a  phrase.  Researchers  scanned  some  200  million  pages  of  content  —  or  the  equivalent  of  about   one   million   books   —   into   the   system   but   were   unable   to   teach   it   to   avoid   all   traps.   During   the   practice   session,   one   wrong  answer  led  to  a  string  of  wrong  answers,  as  the  machine  veered  into  a  wrong  direction.        The  web  is  comprised  of  many  different  types  of  media,  both  mainstream  and  social    Some  media  online  are  more  “fact-­‐based,”  such  as  newspapers  or  general  news,  while  other  are  inherently  more  “opinion-­‐based,”   like   Twitter,   Facebook,   and   forums.   Still   other   media   may   be   one   or   the   other,   like   blogs,   all   of   which   makes   it  difficult  for  automated  sentiment  analysis  technology  to  differentiate  between  subjective  and  objective  information.  For  example,  if  we  look  at  the  sentence  “the  battery  lasts  2  hours”  versus  “the  battery  only  lasts  2  hours,”  there  is  a  sentiment  that  is  implied  in  the  second  sentence  that  is  not  in  the  first.  Social   media   has   also   engendered   new   forms   of   expression   via   an   “SMS-­‐like”   writing   on   social   media   that   makes   text  analysis   more   complicated.   Emoticons   may   or   may   not   help,   and   slang   is   more   commonly   used   in   social   media,   along   with  misspellings  and  bad  grammar,  or  poor  syntax  like  missing  or  added  characters.  Take,  for  example,  “oh  my  gooooooood  WTF  did  you  see  Biebur’s  concert?  It  was  aewsome!  I  lved  it.”  New  forms  of  association  and  ways  of  depicting  negative  sentiment   have   also   arisen,   including   ironic   or   sarcastic   phrasing.   “Another   winner   from   the   almighty   Microsoft,”   for  example,  or  most  recently  “Charlie  Sheen  is  a  winner.”                                                                                                                      5  Isotonic  Conditional  Random  Fields  and  Local  Sentiment  Flow  6  IBM  and  the  Jeopardy!  Challenge  -­‐  Video  -­‐  Wired    Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   6    
  7. 7.  Automated  sentiment  analysis  cannot  understand  sentiment  in  the  context  of  your  business  goals      One  factor  that  many  automated  proponents  have  struggled  to  respond  to  is  analyzing  text  in  the  context  of  a  business.  For   example,   “Nike   reports   that   profits   rose”   and   “Adidas   reports   that   profits   rose”   are   both   positive   sentences   when  evaluated   with   no   context.   If   Nike   is   the   firm   listening   to   social   media,   however,   the   second   phrase   is   suddenly   not   as  positive.  The  “goodness”  or  “badness”  depends  on  whether  the  client  is  Nike  or  Adidas.  Beyond   looking   at   whether   the   information   is   positive   or   negative   for   a   client,   automated   text   analysis   may   extract  information  that  the  company  already  knows  or  does  not  wish  to  focus  on.  For  example,  the  level  at  which  machines  can  decipher  meaning  is  often  limited  to  what  brands  already  know.  If  a  machine  is  told  to  analyze  the  top  trends  around  a  brand,  it  may  include  information  that  the  brand  already  knows.    Automated  analysis  is  limited  in  analyzing  sentiment  for  several  topics  within  an  article    Only   now   are   certain   technologies   emerging   that   can   analyze   sentiment   at   a   feature   level,   but   in   general   automated  sentiment  analysis  technology  has  difficulty  distinguishing  sentiment  between  one  topic  and  another,  particularly  if  more  than  one  are  mentioned  in  the  same  sentence.    A  blog  post  may  be  positive  in  the  first  sentence  and  negative  in  the  second,  or  there  may  be  one  overall  sentiment  for  the  blog  post  with  positive  and  negative  comments.  “Much  work  on  analyzing  sentiment  and  opinions  in  politically  oriented  text  focuses  on  general  attitudes  expressed  through  texts  that  are  not  necessarily  targeted  at  a  particular  issue  or  narrow   7subject.”   A   blogger,   for   example,   may   compare   2   products   within   the   same   post   (or   more).   Posts   on   a   forum   are   often  responses   to   earlier   posts,   and   the   lack   of   context   makes   it   difficult   for   machines   to   decipher   whether   the   post   is   in  agreement  or  disagreement.                                                                                                                                                                  7  Opinion  mining  and  sentiment  analysis,  2008  Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   7    
  8. 8.  The  future  of  semantic  technology    An  interview  with  Seth  Grimes,  an  “Analytics  visionary”      “Watson”,  the  IBM  computer  won  on  the  game  show,  Jeopardy,  created  a  huge  buzz  around  “his”   technology.  Why  do  you  think  there  was  so  much  buzz?     Getting  a  computer  to  play  Jeopardy  was  a  great  stunt.  IBM  made  the  technology  do  something  that   everyone  can  understand.  It  was  a  “stunt,”  however,  because  the  ability  to  win  Jeopardy  is  not  in  high  demand   in   business   or   society.     Nonetheless,   Watson’s   Jeopardy   playing   helps   the   non-­‐technologist   public   understand   the  potential  and  the  reality  of  the  technology.  Question-­‐answer  systems  are  already  out  there,  automating  responses  to  business  questions  –  for  instance,  for  contact-­‐center  support,  customer  inquiries,  and  online  commerce  –  no  requirement  for  a  live  person  on  the  line.  Right  now  Watson  is   focused   on   extracting   factual   information,   but   the   technology   could   be   working   on   sentiment   via   a   sentiment  “annotator.”   Then   we   won’t   be   limited   to   asking   questions   about   facts.   We’ll   be   able   to   ask   about   opinions   and  emotions.    (An  annotator  analyzes  text  and  marks  it  up  with  meaning,  or  attributes,  features  in  the  text.  For  example,  a  name  identity  annotator  finds  geographic  locations  and  “marks  them  up”,  finding  semantic  meaning.  Annotating  pattern-­‐based  entities  can  find  addresses,  identity  location  numbers  by  looking  for  patterns,  and  other  annotators  can  mark  up  other  parts  of  the  text.)    How  accurate  can  this  technology  be?      Accuracy  goals,  and  the  amount  of  work  you  put  into  meeting  them,  should  be  decided  in  light  of  the  business  problem.  Some   problems   will   be   solvable   even   with   low   levels   of   precision   (e.g.,   positive   versus   negative   sentiment   classification)  while   you   might   need   higher   precision   for   other   applications.   “Recall,”   the   ability   to   identify   all   applicable   cases,   is   also  factored  into  accuracy  measurements.  My   impression   is   that   most   sentiment   tools   that   extract   entities   have   out-­‐of-­‐the-­‐box   accuracy   (without   training)   of  something  like  40-­‐50%  but  can  be  “trained”  (by  having  humans  create  marked-­‐up  samples  or  language  rules  or  correct  the  tool)  to  reach  above  the  80%  level.  I  saw  one  claim  of  98%  accuracy,  which  is  laughable  and  ludicrous.  The  only  way  you  can   do   this   is   by   highly   restricting   the   problem   and   tailoring   the   solution   and   being   more   lenient   on   what   counts   as  accurate  or  not.    It   matters   most,   first   that   you   identify   that   there   is   sentiment   there   at   all,   without   even   identifying   if   it   is   positive   or  negative,   and   then   passing   materials   on   for   human   or   machine   classification.   With   machine   filtering   and   humans  analyzing,  for  certain  problems,  you  can  yield  high  levels  of  accuracy.  If  you  really  want  the  machine  to  do  everything,  you  need  to  do  a  lot  more  work  or  you  will  get  much  lower  levels  of  accuracy  over  all,  but  again,  decisions  should  be  made  based  on  business  needs  and  also  the  nature  of  source  materials.  Let  me  add  that  I  consider  that  while  tools  that  analyze  only  at  the  message  or  document  level  may  be  accurate,  the  results  they  produce  will  also  often  be  far  less  than  useful.    Think  about  it.    It  might  be  helpful  if  you’re  running,  say,  a  hotel  group  with   4,200   hotels,   to   know   that   (making   up   numbers)   77%   of   reviews   were   overall   positive,   17%   neutral,   and   6%   negative.    Wouldn’t  it  be  far  more  helpful  to  know,  by  hotel,  opinion  details?    You  want  to  know  when  a  reviewer  found  that  room  cleanliness  and  staff  friendliness  were  exemplary  but  that  noise  was  a  problem.  The  details  in  a  net  positive  review  are  not  typically   going   to   be   all   positive,   and   only   by   knowing   sentiment   at   a   detailed,   “feature,”   level   can   you   reinforce   what’s  great  and  correct  what’s  not.  By  the  way,  let’s  not  overstate  the  accuracy  of  human  sentiment  analysis.    The  best  study  I’ve  seen  of  accuracy  was  done  at  the   University   of   Pittsburgh   in   2005.   While   they   found   only   82%   human   agreement   in   annotating   for   sentiment   Results  jumped   to   over   90%   when   they   removed   uncertain   cases   (when   they   subtracted   cases   where   people   said   they   weren’t  sure).  Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   8    
  9. 9.    Are   there   certain   online   channels   (among   forums,   blogs,   Twitter,   etc)   that   are   easier   to   analyze   using   text   mining   as  opposed  to  others?      To   really   do   it   well   you   have   to   go   to   the   feature   level   (to   the   individual   item).   You   need   strong   natural   language  processing  (NLP)  to  do  that  right.  Twitter  is  interesting  because  it  is  very  hard  to  express  more  than  one  idea  in  a  given  tweet.  Most  tweets  focus  on  a  single  idea  which,  in  theory,  should  make  it  easy  to  analyze.  The  problem  is,  people  use  a  lot  of  slang  and  abbreviation,  which  makes  it  difficult  to  analyze,  as  opposed  to  a  blog  or  article.  Also,  a  tweet  is  often  part  of  a  conversation.  Very  few  tweets  stand   on   their   own;   many   including   an   article   link   or   are   responses   to   someone,   for   example.   Others   are   part   of   multi-­‐way  conversations,  and  you  very  often  need  to  understand  the  whole  conversation  to  get  the  context.  Most  of  the  tools  that  are   out   there   don’t   do   that;   they   don’t   reach   “through   the   tweet”   to   take   into   account   the   threaded   nature   of   Twitter  conversations.  The  more  text  there  is,  the  easier  it  is  to  analyze,  but  at  the  same  time  the  shorter  it  is  the  more  focused  it’s  going  to  be.  But  let’s  move  from  ease  of  analysis  to  business  value  delivered.  Applications  like  Synthesio’s  get  a  lot  of  visibility  because  so  many  people  use  social  media,  but   customer   service   is   the  sentiment-­‐analysis  application  that  has  probably  delivered  the  clearest  business  benefits,  the  greatest  business  value.  Contact  centers  and  surveys  provide  important  data  that  is  more  focused  than  material  out  on  the  web,  associated  with  actual  customers  and  transactions.  You’ll  get  greater  benefit  tying  customer  feedback  to  social  media  data,  rather  than  if  you  spend  your  funds  broadly  listening  to  people  that  are  expressing  opinion  in  a  void,  without  context.    There’s   no   denying   the   potential   benefit   in   broad   social-­‐media   monitoring   and   engagement,   however.     People   will   tell   you  what  they  like  about  your  product  (or  don’t)  and  will  post  things  that  can  be  analyzed  and  shown  to  be  indicators  of  their  intent  (to  buy,  to  complain,  or  cancel  their  service,  etc.)  This  information  can  be  used  to  fix  problems:  the  customer-­‐service  scenario.   Answering   a   customer   to   make   that   person   happy   can   turn   them   into   a   “net   promoter,”   and   the   information   can  be  used  to  improve  quality  so  the  problems  don’t  happen  to  other  people.    Posted  and  analyzed  information   –  beyond-­‐polarity   (positive/negative)   intent   signals   –   can   also   be   used   by   companies   to   identify   and   act   on   opportunities.     This   is  engagement  that  not  only  reactively  responds  to  particular  comments  about  products  and  services.    It’s  engagement  that  proactively  creates  new  and  higher-­‐value  customers.    What  recent  advances  have  you  seen  in  sentiment  analysis  technology?    The  latest  advances  in  analysis  do  go  beyond  “polarity”  or  “valence”  (positive,  negative,  neutral),  and  I  don’t  just  mean  by  rating   sentiment   on   a   scale   from   -­‐10   to   +10   to   capture   “intensity”:   an   advance,   but   we   can   do   more.   For   example,   you  might   look   at   sentiment   in   the   terms   of   emotional   categories   such   as   “angry”,   “sad”,   or   “happy,”   about   a   hotel   service,   for  example.   I’m   sure   we   can   all   think   of   ways   that   automated   understanding   of   emotional   tone   can   be   useful   in   business  contexts.  Then  there  are  the  “intent  signals”  I  was  just  discussing:  sentiment  as  an  indicator  of  plans,  or  actions.  You’re   going   to   get   the   most   flexibility   in   creating   business-­‐suited   categorizations   via   statistical   approaches.   That   is,   the  analyst   sets   up   categories   that   make   sense   and   drags   and   drops   documents   into   the   different   categories   for   “training”  purposes.  The  machine  uses  statistical  similarity  measures  to  discover  what  the  items  in  the  category  have  in  common  in  order  to  automate  classification.    Further,   the   market   is   beginning   to   understand   that   influence   is   best   measured   by   ability   to   affect   business.   Certainly  influence  is  correlated  with  the  number  of  Facebook  friends,  Twitter  followers,  and  retweets,  but  what  should  interest  far  more  is  how  those  measures  translate  into  inquiries,  sales,  and  monetizable  perceptions.    A  person  is  influential  for  real  if  he  or  she  drives  business  transactions.    And   the   market   is   understanding   just   how   shallow   many   of   the   listening   tools   are   –   treating   social   media   as   a   silo,  completely   unlinked   to   enterprise   systems   and   actual   business   transactions,   using   simple   keyword   lists   for   sentiment  classification,  and  applying  sentiment  analysis  only  at  message,  article,  or  document  level  –  and  that  they  can  and  should  do  better,  including  by  joining  the  abilities  of  humans,  who  judge  me  and  discern,  and  the  power  of  machines,  which  are  fast,  work  24  hours  per  day,  and  can  tap  huge  volumes  of  social,  online,  and  enterprise  information  that  are  beyond  human  analysis  regardless  of  cost.  Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   9    
  10. 10.    Conclusion    No   social   media   monitoring   vendor   would   dare   to   pretend   that   technology   can   accurately   (or   even   near-­‐accurately)   assess  sentiment  on  a  specific  topic.  At  subtopic-­‐level  (such  as  what  we  do  at  Synthesio),  it  is  completely  impossible.  However,  NLP  can  at  least  help  identify  trends  at  a  macro  level  such  as  hot  topics  or  aggregate  changes  in  sentiment  over  time.  The  theory  is  that  even  if  the  sentiment  marking  is  inaccurate  (even  by  an  order  of  magnitude),  by  tracking  and  trending  it  over  time  we  can  watch  the  pattern  for  changes  because  we  are  assuming  that  the  level  of  inaccuracy  will  be  consistent  over  time...  However,  there  is  no  proof  of  this  yet.                                      About  Synthesio    Synthesio  is  a  global,  multi-­‐lingual  Social  Media  Monitoring  and  research  company,  utilizing  a  powerful  hybrid  of   tech   and   human   monitoring   services   to   help   Brands   and   Agencies   collect   and   analyze   consumer  conversations   online.     The   result   is   actionable   analytics   and   insights   that   provide   an   accurate   snapshot   of   a  brand  and  help  answer  the  ultimate  questions  –  how  are  we  really  doing  right  now,  and  how  can  we  make  it  better.  Founded   in   2006,   the   company   has   grown   to   include   analysts   who   provide   native-­‐language   monitoring   and  analytic   services   in   over   30   languages   worldwide.   Brands   such   as   Toyota,   Microsoft,   Sanofi,   Accor   Hotels,  Orange  Telecom  and  many  other  well-­‐known  companies  turn  to  Synthesio  for  the  data  they  need  to  engage  with  their  markets,  anticipate  and  prepare  for  emerging  crisis  situations,  and  prepare  for  new  product  or  new  campaign  launches.     WWW.SYNTHESIO.COM    Synthesio  –    The  Truth  About  Natural  Language  Processing  -­‐  March  2011   10