Workshop
 A
Fine‐Grained
Analysis
of
User‐Generated
    Content
to
Support
Decision
Making
                        Marciri...
User‐Generated
Content
(UGC)
•  As
known
as
                                     •  Can
be
expressed
throught
      –  Use...
Example
of
UGC
•  An
opinion
posted
in
Facebook
                                 Dec‐10‐2011,
12:30
pm
      –  “would
hig...
Some
sta<s<cs
about
UGC
•  More
than
50%
of
all
internet
visits
are
now
to
   UGC/social
media
sites.•  More
than
75%
of
<...
Main
Objec<ves
of
this
Workshop
•  In‐depth
analysis
of
UGC
•  Use
UGC
to
support
decision
making
•  Study
a
domain
ontolo...
Outline
Part
1
                                    Part
2
•  Workshop
Context
             •  Sen<ment
Analysis/Opinion
• ...
Context
Workshop
     A
framework
for
Customer
Knowledge
Management
based
on
Social
Seman<c
Web.
    
Chaves,
Marcirio
Sil...
An
Fine‐grained
Analysis
of
UGC
•  Overall
opinion
about
a
topic
is
only
a
part
of
the
   informaJon
of
interest.

•  Docu...
UGC
      An
opinion
is
simply
a
posiJve
or
negaJve
         senJment,
view,
aPtude,
emoJon,
or
      appraisal
about
an
e...

Characterisa<on
of
UGC
•  Opinion’s
Characterisa<on
      –  I
use
and
extend
the
defini<on
proposed
by
(Ding
         et
...
Characterisa<on
of
UGC
•  Opinion’s
Characterisa<on
      –  O:
Object
      –  F:
Feature
      –  SO:
Seman<c‐Orienta<on...
Characterisa<on
of
UGC
1  ‐
Object
(O)
     – 
An
object
is
a
product
(e.g.
movie
and
book)
or
a
     service
(e.g.
hotel
...
Characterisa<on
of
UGC
2.1
‐
Explicit
Feature
(F)
      –  If
a
feature
f
appears
in
review
r,
it
is
called
an
         ex...
Characterisa<on
of
UGC
3
‐
Sentence‐OrientaJon
(SO)
      –  A
review
consists
of
a
sequence
of
sentences
         r=〈
s1,...
Characterisa<on
of
UGC
3.1
ObjecJvity

      –  An
objec<ve
sentence
contains
or
menJon
facts.
             •  This
hotel
...
Characterisa<on
of
UGC
3.3
Intensity
(strength
of
the
polarity)
      –  It
refers
to
the
strength
of
the
private
state
th...
Characterisa<on
of
UGC
4
‐
Opinion
Holder
(H)
      –  The
holder
of
a
par<cular
opinion
is
the
person
or
the
         org...
Characterisa<on
of
UGC
5
–
Source
      –  An
informa<on
source
is
a
web
site
which
provides
         a
set
of
reviews.

 ...
Outline
Part
1
                                    Part
2
•  Workshop
Context
             •  Sen<ment
Analysis/Opinion
• ...
Limita<ons
for
represen<ng
knowledge
in
the
              accommoda<on
sector
                                            ...
More
limita<ons
•  Actually,
web
agents
are
unable
to
answer
   ques<ons
such
as:
      –  What
are
the
hotels
with
longer...
Knowledge
Engineering
•  Ontology
as
a
support
to
evaluate
UGC
      –  Set
of
concepts
to
a
specific
domain
      –  Human...
Context
Workshop
     A
framework
for
Customer
Knowledge
Management
based
on
Social
Seman<c
Web.
    
Chaves,
Marcirio
Sil...
Knowledge
Engineering
 •  Development
Methodology

       –      Iden<fy
exis<ng
ontologies
on
related
domains
       –   ...
Knowledge
Engineering
•  Hontology
      –  A
mulJlingual
ontology
for
the
accommodaJon
         sector.

•  Demo
Protégé
...
Knowledge
Engineering
                                Metrics
                     Value
                  Number
of
Conce...
Hands‐on
Session
•  The
aim
of
this
hands‐on
session
is
to
allow
you
thinking
   in‐depth
about
UGC
on
the
context
of
the
...
Outline
Part
1
                                    Part
2
•  Workshop
Context
             •  Sen<ment
Analysis/Opinion
• ...
Sen<ment
Analysis

•  Analysis
and
automaJc
extracJon
of
SemanJc
   OrientaJon
•  SemanJc
orientaJon
refers
to
the
polarit...
Sen<ment
Analysis

•  Lexicon‐based
Approach
      –  Sen<ment‐bearing
words:
a
list
of
nouns,
verbs,
         adjecJves
a...
Sen<ment
Analysis

•  Seed
words

      –  are
a
small
set
of
words
with
strong
negaJve
or
         posiJve
associa<ons,
s...
Sen<ment
Analysis

•  Part‐of‐Speech
(PoS)
      –  In
order
to
evaluate
a
sentence
in
a
review,
we
         should
consid...
Sen<ment
Analysis

•  ConjuncJon
and
ConnecJve
(CC)
      –  Connec<ves
are
words
that
help
iden<fying
         addiJonal
...
Sen<ment
Analysis
•  ConjuncJon
and
ConnecJve
(CC)
      –  Rules
or
constraints
are
also
designed
for
other
         conn...
Sen<ment
Analysis
•  Strength
of
the
PolaJry
or
Intensity
or
   IntensificaJon
      –  Amplifiers
(very,
a
lot)
increase
th...
Sen<ment
Analysis
•  NegaJon
      –  The
obvious
approach
to
nega<on
is
simply
to
         reverse
the
polarity
of
the
le...
Polarity
Recognizer
in
Portuguese
(PIRPO)

•  Polarity
Recognizer
in
Portuguese
to
classify
senJment
in
   online
reviews....
PIRPO
Informa<on
Architecture
Apr‐18‐12
           Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
   38

PIRPO
•  Reviews
      –  Full
dataset:
1500
reviews
from
January
2010
to
         April
2011
in
Portuguese,
English
and
S...
PIRPO
•  List
of
adjecJves:
It
is
composed
by
sen<ment‐   bearing
words.

      –  This
list
of
polar
adjecJves
in
Portugu...
PIRPO
Algorithm
Apr‐18‐12
    Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
   41

PIRPO
Measure
Evalua<on
•  Precision
                {relevantConcepts} ∩ {retrievedConcepts}             P=              ...
PIRPO
Preliminary
Results
Apr‐18‐12
        Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
   43

PIRPO:
Discussion
on
the
Results
•  PIRPO
reached
a
be/er
   recall
for
concepts
with
   posi<ve
polarity,
while
   mixed
...
Outline
Part
1
                                     Part
2
•  Workshop
Context
                        •  Knowledge
Engine...
Context
Workshop
     A
framework
for
Customer
Knowledge
Management
based
on
Social
Seman<c
Web.
    
Chaves,
Marcirio
Sil...
Informa<on
Visualisa<on

•  What
is
the
visual
model
of
the
poten<al
end‐user?
•  How
should
we
properly
map
and
render:
 ...
Exploring
Informa<on
Visualisa<on

•  In
the
next
figures
      –  The
color
was
used
to
map
the
polarity
and
the
         ...
Exploring
Informa<on
Visualisa<on

     Result
of
the
applica<on
of
Bubble
Tree
visualisaJon
of
the
      rela<on
among
co...
Exploring
Informa<on
Visualisa<on

    Results
using
Treemap
visualisaJon
of
the
rela<on
among
type
of
            custome...
Ques<onnaire
(in
Spanish)
•  You
are
going
to
receive
a
ques<onnaire
about
   informa<on
visualisa<on
using
UGC
in
the
con...
Final
Remarks
•  In‐depth
analysis
of
UGC
can
be
used
as
input
   to
improve
decision
making.
•  It
is
<me
to
think
about
...
Main
References
•    S.
Bethard,
H.
Yu,
A.
Thornton,
V.
Hatzivassiloglou,
and
D.
Jurafsky,
2004.
Automa<c
extrac<on
of
opi...
Open‐source
sen<ment‐analysis
tools
•  Python
NLTK
(Natural
Language
Toolkit)
–  h/p://www.nltk.org
and
h/p://text‐process...
Thank
you
very
much
for
your
              a/en<on!!
              Ques<ons
Apr‐18‐12
    Marcirio
Chaves
‐
marcirioc@uatl...
Upcoming SlideShare
Loading in …5
×

A Fine-Grained Analysis of User-Generated Content to Support Decision Making

797 views

Published on

User-generated content (UGC) such as online reviews is freely available in the web. This kind of data has been used to support clients’ and managerial decision-making in several industries, e.g. books, tourism, or hospitality. In this workshop, I will introduce a fine-grained characterisation of UGC and a new multidomain and multilingual conceptual data model to represent UGC.
Moreover, I will present a domain-specific ontology for accommodations that can be also used to support managerial decision making and end-user applications. Instead of the few categories commonly provided by Web 2.0 portals, this ontology enables accommodation managers to find specific information. The ontology is also used as input for an algorithm to recognise sentiment in online reviews. Finally, I will describe some of the main approaches to deal with sentiment analysis.
In short, I will address some of the main challenges of UGC introducing:
a) A proposal for a fine-grained characterisation of UGC;
b) A structured representation of UGC which leverages the information provided by the use of Web 2.0 applications;
c) The main approaches to perform sentiment analysis;
d) An ontology to represent knowledge in the accommodation sector.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
797
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Fine-Grained Analysis of User-Generated Content to Support Decision Making

  1. 1. Workshop
 A
Fine‐Grained
Analysis
of
User‐Generated
 Content
to
Support
Decision
Making
 Marcirio
Silveira
Chaves
 h/p://mchaves.wikidot.com

 Informa<on
Systems
Research
Group
 Business
and
Informa<on
Technology
Research
Centre
(BITREC)
Ins<tute
for
Scien<fic
and
Technological
Research
of
Universidade
Atlân<ca
(ISTR)

  2. 2. User‐Generated
Content
(UGC)
•  As
known
as
 •  Can
be
expressed
throught
 –  User‐Generated
Data
 –  Opinions
 –  User‐Created
Content
 –  Reviews
 –  User‐Contributed
Data
 –  Comments
 –  Consumer‐Generated
 –  Posts
 Media
 –  …
 • 
Notes:
 • 
All
the
examples
described
in
this
workshop
are
real
data.
 • 
Some
papers
men<oned
here
are
under
review.
 • 
Color
legend:
 • 
Examples
 • 
Posi<ve
feature
 • 
Nega<ve
feature
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 2

  3. 3. Example
of
UGC
•  An
opinion
posted
in
Facebook
 Dec‐10‐2011,
12:30
pm
 –  “would
highly
recommend
Infinity
Motorcycles,
 Southampton
for
all
motorbiking
gear.
Very
 reasonable
people.
Earlier
they
gave
me
a
full
 money
back
for
a
unused
(aer
explaining
why
it
 was
unused)
ladies
motorbike
jacket
(no
defects
 whasoever)
and
today
the
zipper
on
my
new
 jacket
was
broken
and
they
gave
me
a
brand
new
 one
(no
ques<ons
asked,
no
receipt
business
and
 no
fuss
created).
Five
Star
service.”
 –  This
user
had
226
friends.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 3

  4. 4. Some
sta<s<cs
about
UGC
•  More
than
50%
of
all
internet
visits
are
now
to
 UGC/social
media
sites.•  More
than
75%
of
<me
spent
on
the
internet
 is
"social”.
•  Facebook
now
captures
as
much
<me
spent
 on
the
internet
as
Google,
Yahoo,
and
AOL.
•  More
than
80%
of
consumers
are
influenced
 by
Social
MarkeJng.
 Source: http://www.bbrisco.com/2010/05/social.html Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 4

  5. 5. Main
Objec<ves
of
this
Workshop
•  In‐depth
analysis
of
UGC
•  Use
UGC
to
support
decision
making
•  Study
a
domain
ontology
to
support
Ar<ficial
 Intelligence
tasks
•  Address
approaches
for
sen<ment
analysis
•  From
theory
to
prac<ce:
Hands‐on
Session
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 5

  6. 6. Outline
Part
1
 Part
2
•  Workshop
Context
 •  Sen<ment
Analysis/Opinion
•  User‐Generated
Content
 Mining
 (UGC)
 •  Polarity
Recognizer
in
•  Characterisa<on
of
UGC
 Portuguese
(PIRPO)
•  Knowledge
Engineering
‐
 •  Informa<on
Visualisa<on
 Ontology
Development
•  Hands‐on
Session
(Individual
 Task):
Dealing
with
UGC
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 6

  7. 7. Context
Workshop
 A
framework
for
Customer
Knowledge
Management
based
on
Social
Seman<c
Web.
 
Chaves,
Marcirio
Silveira;
Trojahn,
Cássia
and
Pedron,
Cris<ane
Drebes. A
Framework
for
Customer
Knowledge
Management
based
on
Social
Seman<c
Web:
A
Hotel
Sector
Approach.
In:
 Customer
Rela<onship
Management
and
the
Social
and
Seman<c
Web:
Enabling
Cliens
Conexus.
Colomo‐Palacios,
 R.;
Varajão,
J.
and
Soto‐Acosta,
P.
(Eds.).
p.
141‐157,
Hershey,
PA:
IGI
Global,
2012.
ISBN:
978‐161‐35‐0044‐6
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 7

  8. 8. An
Fine‐grained
Analysis
of
UGC
•  Overall
opinion
about
a
topic
is
only
a
part
of
the
 informaJon
of
interest.

•  Document‐level
senJment
classificaJon
fails
to
detect
 sen<ment
about
individual
aspects
of
the
topic.
In
 reality,
for
example,
though
one
could
be
generally
 happy
about
his
car,
he
might
be
dissaJsfied
by
the
 engine
noise.

•  To
the
manufacturers,
these
individual
weaknesses
and
 strengths
are
equally
important
to
know,
or
even
more
 valuable
than
the
overall
sa<sfac<on
level
of
customers.
 (Tang
et
al.
2009)
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 8

  9. 9. UGC
 An
opinion
is
simply
a
posiJve
or
negaJve
 senJment,
view,
aPtude,
emoJon,
or
 appraisal
about
an
enJty
or
an
aspect
of
the
 enJty
(Hu
and
Liu,
2004;
Liu,
2006)
from
an
 opinion
holder
(Bethard
et
al.,
2004;
Kim
and
 Hovy,
2004;
Wiebe
et
al.,
2005).
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 9

  10. 10. 
Characterisa<on
of
UGC
•  Opinion’s
Characterisa<on
 –  I
use
and
extend
the
defini<on
proposed
by
(Ding
 et
al.,
2008;
Liu,
2010;
Mar<n
and
White,
2005)
to
 analyse
the
sentences
of
reviews.

 –  Let
the
review
be
r.

 –  In
the
most
general
case,
r
is
characterised
as
a
set
 of
the
following
elements
{O,F,SO,H,S,A,R,I,SG},
 where:
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 10

  11. 11. Characterisa<on
of
UGC
•  Opinion’s
Characterisa<on
 –  O:
Object
 –  F:
Feature
 –  SO:
Seman<c‐Orienta<on
 –  H:
Holder

 –  S:
Source
 –  A:
A%tude
 –  SG:
Sugges.on

 –  R:
Recommenda.on

 –  I:
Inten.on
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 11

  12. 12. Characterisa<on
of
UGC
1  ‐
Object
(O)
 – 
An
object
is
a
product
(e.g.
movie
and
book)
or
a
 service
(e.g.
hotel
and
restaurant)
under
review
 which
is
composed
by
features.
 – 
Objects
are
also
called
enJJes.
2
‐
Feature
(F)
 –  A
feature
is
a
component
or
part
of
an
object.

 •  actor
and
photography
are
features
on
a
movie.
 •  pool
and
staff
are
features
on
a
hotel.

 –  Features
are
also
called
aXributes
or
facets.

 –  A
feature
can
be
men<oned
explicitly
or
implicitly
 in
a
review
(Ding
et
al.
2008).
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 12

  13. 13. Characterisa<on
of
UGC
2.1
‐
Explicit
Feature
(F)
 –  If
a
feature
f
appears
in
review
r,
it
is
called
an
 explicit
feature
in
r.

 –  The
hotel
is
located
very
near
the
center
city.
 •  loca<on
is
an
explicit
feature.
2.2
‐
Implicit
Feature
(F):

 –  If
f
does
not
appear
in
r
but
is
implied,
it
is
called
 an
implicit
feature
in
r.

 –  Hotel
is
far
from
public
transporta<on.
 •  loca<on
is
an
implicit
feature.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 13

  14. 14. Characterisa<on
of
UGC
3
‐
Sentence‐OrientaJon
(SO)
 –  A
review
consists
of
a
sequence
of
sentences
 r=〈
s1,
s2,
…,
sm〉(Ding
et
al.,
2008).

 –  A
sentence
can
be
evaluated
as
the
following
 perspec<ves:
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 14

  15. 15. Characterisa<on
of
UGC
3.1
ObjecJvity

 –  An
objec<ve
sentence
contains
or
menJon
facts.
 •  This
hotel
is
far
from
the
airport,
ca.
15km.
 –  A
subjec<ve
sentence
does
not
menJon
any
fact.
 •  The
parking
could
be
free.
3.2
Polarity
 –  It
describes
the
orientaJon
present
in
a
sentence
 (i.e.
posiJve,
negaJve,
neutral
and
irrelevant).
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 15

  16. 16. Characterisa<on
of
UGC
3.3
Intensity
(strength
of
the
polarity)
 –  It
refers
to
the
strength
of
the
private
state
that
is
 being
expressed,
in
other
words,
how
strong
is
an
 emo<on
or
a
convic<on
of
belief
(Wilson,
2008).
 –  It
describes
how
intense
it
was
the
experience
using
 a
product
or
service:
 •  very
posiJve,
posiJve,
neutral,
negaJve
and
very
 negaJve.

 •  Very
kindly
staff.
refers
to
a
very
posi<ve
impression
on
 the
staff
service.

Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 16

  17. 17. Characterisa<on
of
UGC
4
‐
Opinion
Holder
(H)
 –  The
holder
of
a
par<cular
opinion
is
the
person
or
the
 organisaJon
that
holds
the
opinion
(Ding
et
al.,
2008).

 –  A
holder
is
iden<fied
with
demographic
characterisJcs
 (e.g.
name,
city
and
country).

 –  Sites
such
as
tripadvisor.com
and
booking.com
classify
 holders
as
types
including:
 •  families
with
older
children
 •  families
with
young
children

 •  mature
couples
 •  groups
of
friends

 •  solo
travellers

 •  young
couples
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 17

  18. 18. Characterisa<on
of
UGC
5
–
Source
 –  An
informa<on
source
is
a
web
site
which
provides
 a
set
of
reviews.

 •  tripadvisor.com

 •  booking.com
 •  amazon.com

•  A:
A%tude
•  SG:
Sugges.on

•  R:
Recommenda.on

•  I:
Inten.on
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 18

  19. 19. Outline
Part
1
 Part
2
•  Workshop
Context
 •  Sen<ment
Analysis/Opinion
•  User‐Generated
Content
 Mining
 (UGC)
 •  Polarity
Recognizer
in
•  Characterisa<on
of
UGC
 Portuguese
(PIRPO)
•  Knowledge
Engineering
‐
 •  Informa<on
Visualisa<on
 Ontology
Development
•  Hands‐on
Session
(Individual
 Task):
Dealing
with
UGC
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 19

  20. 20. Limita<ons
for
represen<ng
knowledge
in
the
 accommoda<on
sector
 language?
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 20

  21. 21. More
limita<ons
•  Actually,
web
agents
are
unable
to
answer
 ques<ons
such
as:
 –  What
are
the
hotels
with
longer
indoor
swimming
 pool
Jme
table
in
Roma?
 –  What
are
the
hotels
with
the
cheapest
breakfast
 in
Lisbon?
 –  What
are
the
cheapest
hotels
with
family
suite
 room
with
sea
view
in
Barcelona?

Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 21

  22. 22. Knowledge
Engineering
•  Ontology
as
a
support
to
evaluate
UGC
 –  Set
of
concepts
to
a
specific
domain
 –  Human
and
machine
readable
 –  Support
to
fine‐grained
analysis
of
the
instances
 (e.g.
reviews)
 –  Hontology
(H
stands
for
hotel,
hostal
and
hostel)

 •  A
robust,
coherent
and
mul<lingual
representa<on
of
 the
accommoda<on
sector.

Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 22

  23. 23. Context
Workshop
 A
framework
for
Customer
Knowledge
Management
based
on
Social
Seman<c
Web.
 
Chaves,
Marcirio
Silveira;
Trojahn,
Cássia
and
Pedron,
Cris<ane
Drebes. A
Framework
for
Customer
Knowledge
Management
based
on
Social
Seman<c
Web:
A
Hotel
Sector
Approach.
In:
 Customer
Rela<onship
Management
and
the
Social
and
Seman<c
Web:
Enabling
Cliens
Conexus.
Colomo‐Palacios,
 R.;
Varajão,
J.
and
Soto‐Acosta,
P.
(Eds.).
p.
141‐157,
Hershey,
PA:
IGI
Global,
2012.
ISBN:
978‐161‐35‐0044‐6
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 23

  24. 24. Knowledge
Engineering
 •  Development
Methodology

 –  Iden<fy
exis<ng
ontologies
on
related
domains
 –  Select
the
main
concepts
and
proper<es
 –  Organize
concepts
and
proper<es
hierarchically
into
categories
 –  Translate
the
ontology
(manual)
 –  Expand
concepts
and
proper<es
based
on
comments
 –  Translate
the
new
concepts
and
proper<es
(manual)
 –  Generate
the
ontology
in
several
formats
Chaves,
M.
S.
and
Trojahn,
C.
Towards
a
MulJlingual
Ontology
for
Ontology‐driven
Content
Mining
in
Social
Web
Sites.
Proc.
of
the
ISWC
2010
Workshops,
Volume
I,
1st
InternaJonal
Workshop
on
Cross‐Cultural
and
Cross‐Lingual
Aspects
of
the
SemanJc
Web.
Shanghai,
China,
November
7th,
2010.
 Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 24

  25. 25. Knowledge
Engineering
•  Hontology
 –  A
mulJlingual
ontology
for
the
accommodaJon
 sector.

•  Demo
Protégé
 
Chaves,
M.
S.;
Freitas,
L.
A.
and
Vieira,
R.
(2012).
Hontology:
A
mulJlingual
ontology
for
the
 accommodaJon
sector.
4th
InternaJonal
Conference
on
Knowledge
Engineering
and
Ontology
 Development,
Barcelona,
Spain,
4‐7
October.
(SubmiXed)
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 25

  26. 26. Knowledge
Engineering
 Metrics
 Value
 Number
of
Concepts
 285
 Number
of
Object
Proper<es
 10
 Number
of
Data
Proper<es
 31
 Concept
Axioms
 Preliminary
 Subconcept
axioms
 270
 Hontology
 Equivalent
concepts
axioms
 4
 Disjoint
concepts
axioms
 93
 Sta<s<cs

 Object
Property
Axioms
 Func<onal
object
property
axioms
 6
 Object
property
domain
axioms
 11
 Object
property
range
axioms
 8
 Data
Property
Axioms
 Func<onal
data
property
axioms
 12
 Object
data
domain
axioms
 17
 Object
data
range
axioms
 1
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 26

  27. 27. Hands‐on
Session
•  The
aim
of
this
hands‐on
session
is
to
allow
you
thinking
 in‐depth
about
UGC
on
the
context
of
the
accommoda<on
 sector.
•  You
are
going
to
receive
a
set
of
4
or
5
reviews
about
 accommoda<ons
and
should
evaluate
each
one
according
 to
the
following
parameters:
 –  Features
present
in
the
review
(see
the
concepts
of
 Hontology)
 –  Intensity
(Strength
of
the
Polarity):
(very
nega<ve,
 nega<ve,
neutral,
posi<ve,
very
posi<ve)
•  Notes:
 –  Evaluate
one
feature
per
line.
 –  Please,
save
your
sheet
in
another
file
and
send
to
 mschaves@gmail.com.
Subject:
UB:GX
 –  X
=
number
of
the
group.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 27

  28. 28. Outline
Part
1
 Part
2
•  Workshop
Context
 •  Sen<ment
Analysis/Opinion
•  User‐Generated
Content
 Mining
 (UGC)
 •  Polarity
Recognizer
in
•  Characterisa<on
of
UGC
 Portuguese
(PIRPO)
•  Knowledge
Engineering
‐
 •  Informa<on
Visualisa<on
 Ontology
Development
•  Hands‐on
Session
(Individual
 Task):
Dealing
with
UGC
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 28

  29. 29. Sen<ment
Analysis

•  Analysis
and
automaJc
extracJon
of
SemanJc
 OrientaJon
•  SemanJc
orientaJon
refers
to
the
polarity
and
 strength
of
words,
phrases,
or
texts.
•  Approaches
 –  Lexicon‐based
 •  Dic<onaries
of
words
annotated
with
the
word´s
seman<c
 orienta<on,
or
polarity.

 •  A
manually
built
dicJonary
provides
a
solid
foundaJon
for
a
 lexicon‐based
approach
(Taboada
et.
al.,
2011).

 –  StaJsJcal
or
Machine‐learning
 •  Supervised
classifica<on
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 29

  30. 30. Sen<ment
Analysis

•  Lexicon‐based
Approach
 –  Sen<ment‐bearing
words:
a
list
of
nouns,
verbs,
 adjecJves
and
adverbs
(Chesley
et
al.,
2006)

 •  use
verbs
and
adjec<ves
to
classify
English
 opinionated
blog
texts.

 –  List
of
conjuncJons
and
connecJves
(Liu,
2010).

 –  Use
of
auxiliary
verbs
to
get
features
and
opinion‐ oriented
words
about
products
from
texts
(Khan
et
 al.,
2010).

Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 30

  31. 31. Sen<ment
Analysis

•  Seed
words

 –  are
a
small
set
of
words
with
strong
negaJve
or
 posiJve
associa<ons,
such
as
excellent
or
abysmal.

 –  In
principle,
a
posi<ve
adjec<ve
should
occur
more
 frequently
alongside
the
posi<ve
seed
words,
and
 thus
will
obtain
a
posi<ve
score,
whereas
nega<ve
 adjec<ves
will
occur
most
oen
in
the
vicinity
of
 nega<ve
seed
words,
thus
obtaining
a
nega<ve
 score
(Taboada
et.
al.
2011).

 •  This
restaurant
has
a
bad
and
expensive
food.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 31

  32. 32. Sen<ment
Analysis

•  Part‐of‐Speech
(PoS)
 –  In
order
to
evaluate
a
sentence
in
a
review,
we
 should
consider
the
parts‐of‐speech
men<oned
 such
as
adjecJves,
adverbs
and
verbs.

 –  Adjec<ves
are
classified
as:
 •  posi<ve
(good,
excellent
and
clean),

 •  nega<ve
(awful,
boring
and
terrible),

 •  neutral
(regular
and
indifferent)
and

 •  dual,
which
can
express
posi<ve
and
nega<ve
opinion
 (small,
long).

 –  In
some
approaches
nouns
are
represented
by
 concepts
of
a
domain
ontology
and
mapped
as
 features.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 32

  33. 33. Sen<ment
Analysis

•  ConjuncJon
and
ConnecJve
(CC)
 –  Connec<ves
are
words
that
help
iden<fying
 addiJonal
adjecJve
opinion
words
and
their
 orientaJons.

 –  One
of
the
constraints
is
about
conjunc<on
(i.e.
and),
 which
says
that
conjoined
adjec<ves
usually
have
the
 same
orienta<on
(Liu,
2010).

 •  This
room
is
beau<ful
and
spacious.
 –  if
beau<ful
is
known
to
be
posi<ve,
it
can
be
inferred
that
spacious
 is
also
posi<ve.

 –  HeurisJc:
 •  People
usually
express
the
same
opinion
on
both
sides
of
 a
conjuncJon.

Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 33

  34. 34. Sen<ment
Analysis
•  ConjuncJon
and
ConnecJve
(CC)
 –  Rules
or
constraints
are
also
designed
for
other
 connec<ves
(e.g.
or,
but,
either‐or,
and
neither‐nor).

 •  This
hotel
is
beau<ful
but
difficult
to
get
there.
 –  The
occurrence
aer
the
connec<ve
but
is
an
indicator
of
a
 nega<ve
opinion.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 34

  35. 35. Sen<ment
Analysis
•  Strength
of
the
PolaJry
or
Intensity
or
 IntensificaJon
 –  Amplifiers
(very,
a
lot)
increase
the
seman<c
intensity
 of
a
neighboring
lexical
item;
 –  AXenuators/Downtoners
(a
li/le,
slightly)
decrease
it.

•  Some
approaches
have
implemented
intensifiers
 using
simple
addiJon
and
subtracJon
 –  if
a
posi<ve
adjec<ve
has
an
SO
value
of
2:
 •  an
amplified
adjec<ve
would
have
an
SO
value
of
3,
and

 •  a
downtoned
adjec<ve
an
SO
value
of
1.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 35

  36. 36. Sen<ment
Analysis
•  NegaJon
 –  The
obvious
approach
to
nega<on
is
simply
to
 reverse
the
polarity
of
the
lexical
item
next
to
a
 negator,
changing
good
(+3)
into
not
good
(−3).
 –  Not,
none,
nobody,
never,
and
nothing,
and
other
 words,
such
as
without
or
lack.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 36

  37. 37. Polarity
Recognizer
in
Portuguese
(PIRPO)

•  Polarity
Recognizer
in
Portuguese
to
classify
senJment
in
 online
reviews.

•  PIRPO
was
built
from
the
ground
to
Portuguese
for
 recognising
the
polarity
of
the
user
opinion
on
 accommoda<on
reviews.

•  Each
review
is
analysed
according
to
concepts
from
a
 domain
ontology.

•  We
decompose
the
review
in
sentences
in
order
to
assign
a
 polarity
to
each
concept
of
the
ontology
in
the
sentence.
Chaves,
M.
S.,
Freitas,
L.,
Souza,
M.
and
Vieira,
R.
PIRPO:
An
Algorithm
to
deal
with
Polarity
in
Portuguese
Online
Reviews
from
the
AccommodaJon
Sector.
17th
InternaJonal
conference
on
ApplicaJons
of
Natural
Language
Processing
to
InformaJon
Systems
(NLDB),
Groningen,
The
Netherlands,
26‐28
June
2012.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 37

  38. 38. PIRPO
Informa<on
Architecture
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 38

  39. 39. PIRPO
•  Reviews
 –  Full
dataset:
1500
reviews
from
January
2010
to
 April
2011
in
Portuguese,
English
and
Spanish,
 from
which
180
in
Portuguese.

•  Ontology
Concepts
 –  The
concepts
used
to
classify
the
reviews
are
 provided
by
Hontology,
which
in
its
current
 version,
has
110
concepts.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 39

  40. 40. PIRPO
•  List
of
adjecJves:
It
is
composed
by
sen<ment‐ bearing
words.

 –  This
list
of
polar
adjecJves
in
Portuguese

 •  contains
30.322
entries.

 •  is
composed
by
the
name
of
the
adjecJve
and
a
polarity
 which
can
assign
one
of
three
values:
+1,
‐1
and
0.

 •  These
values
corresponding
to
the
posiJve,
negaJve
and
 neutral
senses
of
the
adjec<ve.

 –  PIRPO
uses
this
list
to
calculate
the
semanJc
 orientaJon
of
the
concepts
found
in
the
sentences.

Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 40

  41. 41. PIRPO
Algorithm
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 41

  42. 42. PIRPO
Measure
Evalua<on
•  Precision
 {relevantConcepts} ∩ {retrievedConcepts} P= {retrievedConcepts}•  Recall
 {relevantConcepts} ∩ {retrievedConcepts} € R= {relevantConcepts}•  F‐score
(harmonic
mean
of
precision
and
recall)
 € P×R F =2× P+RApr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 42

  43. 43. PIRPO
Preliminary
Results
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 43

  44. 44. PIRPO:
Discussion
on
the
Results
•  PIRPO
reached
a
be/er
 recall
for
concepts
with
 posi<ve
polarity,
while
 mixed
polarity
had
a
 higher
precision.

•  The
low
F‐score
can
be
 mainly
due
to
the
 algorithm
has
assigned
 a
polarity
to
a
specific
 concept
of
the
 ontology,
while
the
 human
classified
the
 review
as
a
whole.

Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 44

  45. 45. Outline
Part
1
 Part
2
•  Workshop
Context
 •  Knowledge
Engineering
‐
•  User‐Generated
Content
 Modelling
UGC
 (UGC)
 •  Sen<ment
Analysis/Opinion
•  Characterisa<on
of
UGC
 Mining
•  Knowledge
Engineering
‐
 •  Polarity
Recognizer
in
 Ontology
Development
 Portuguese
(PIRPO)
•  Hands‐on
Session
(Individual
 •  Informa<on
Visualisa<on
 Task):
Dealing
with
UGC
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 45

  46. 46. Context
Workshop
 A
framework
for
Customer
Knowledge
Management
based
on
Social
Seman<c
Web.
 
Chaves,
Marcirio
Silveira;
Trojahn,
Cássia
and
Pedron,
Cris<ane
Drebes. A
Framework
for
Customer
Knowledge
Management
based
on
Social
Seman<c
Web:
A
Hotel
Sector
Approach.
In:
 Customer
Rela<onship
Management
and
the
Social
and
Seman<c
Web:
Enabling
Cliens
Conexus.
Colomo‐Palacios,
 R.;
Varajão,
J.
and
Soto‐Acosta,
P.
(Eds.).
p.
141‐157,
Hershey,
PA:
IGI
Global,
2012.
ISBN:
978‐161‐35‐0044‐6
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 46

  47. 47. Informa<on
Visualisa<on

•  What
is
the
visual
model
of
the
poten<al
end‐user?
•  How
should
we
properly
map
and
render:
 –  the
most
valued
accommoda<on
features?
 –  the
percep<on
of
the
quality
offered
by
the
hotel?
 –  the
correla<on
between
the
guest’s
profile
and
the
 mostly
relevant
features?
 –  the
intensity
of
the
posi<vity
or
nega<vity
of
the
 features?


•  Does
the
use
of
advanced
visual
techniques
(such
as
 tree
oriented)
to
map
the
results
will
help
the
 accommoda<on
managers
and
guests
to
have
a
 be/er
insight
of
the
data?
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 47

  48. 48. Exploring
Informa<on
Visualisa<on

•  In
the
next
figures
 –  The
color
was
used
to
map
the
polarity
and
the
 strength
of
the
polarity
values
on
the
CO.

 –  The
size
was
used
to
map
the
frequency
that
the
 CO
is
men<oned
in
the
reviews.

Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 48

  49. 49. Exploring
Informa<on
Visualisa<on

 Result
of
the
applica<on
of
Bubble
Tree
visualisaJon
of
the
 rela<on
among
concepts
of
the
ontology,
polarity
(le)
and
 strength
of
the
polarity
(right).
•  Carvalho,
E.;
Chaves,
M.
S.,
2012.
Exploring
User‐Generated
Data
VisualizaJon
in
the
AccommodaJon
 Sector.
16th
InternaJonal
Conference
InformaJon
VisualisaJon,
IEEE.
(SubmiXed)
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 49

  50. 50. Exploring
Informa<on
Visualisa<on

 Results
using
Treemap
visualisaJon
of
the
rela<on
among
type
of
 customer,
concepts
of
the
ontology
and
polarity.

Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 50

  51. 51. Ques<onnaire
(in
Spanish)
•  You
are
going
to
receive
a
ques<onnaire
about
 informa<on
visualisa<on
using
UGC
in
the
context
of
 the
accommoda<on
sector.
•  Please,
click
here
h/p://kwiksurveys.com?u=Infovises
 to
answer
it.

Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 51

  52. 52. Final
Remarks
•  In‐depth
analysis
of
UGC
can
be
used
as
input
 to
improve
decision
making.
•  It
is
<me
to
think
about
new
models
to
store
 UGC
data.
•  It
is
necessary
the
building
from
the
ground
of
 new
algorithms
to
deal
with
UGC
for
 languages
other
than
English.
•  InformaJon
visualisaJon
of
UGC
is
in
its
 infancy
state.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 52

  53. 53. Main
References
•  S.
Bethard,
H.
Yu,
A.
Thornton,
V.
Hatzivassiloglou,
and
D.
Jurafsky,
2004.
Automa<c
extrac<on
of
opinion
proposi<ons
and
 their
holders.
in
Proceedings
of
the
AAAI
Spring
Symposium
on
Exploring
A%tude
and
Affect
in
Text.
•  Chesley,
P.;
Vincent,
B.;
Xu,
L.
and
Srihari
R.,
2006.
Using
verbs
and
adjec<ves
to
automa<cally
classify
blog
sen<ment.
in
 AAAI
Symposium
on
Computa<onal
Approaches
to
Analysing
Weblogs
(AAAI‐CAAW),
27–29.

•  Ding,
X.,
Liu,
B.,
and
Yu,
P.
S.,
2008.
A
holis<c
lexicon‐based
approach
to
opinion
mining.
Proceedings
of
the
Conference
on
 Web
Search
and
Web
Data
Mining
(WSDM).
•  M.
Hu
and
B.
Liu,
2004.
Mining
opinion
features
in
customer
reviews.
In
Proceedings
of
AAAI,
pp.
755–760.
•  S.‐M.
Kim
and
E.
Hovy,
2004.
Determining
the
sen<ment
of
opinions.
In
Proceedings
of
the
Interna.onal
Conference
on
 Computa.onal
Linguis.cs
(COLING),
2004.
•  Liu,
Bing,
2010.
Sen<ment
Analysis
and
Subjec<vity.
In
Handbook
of
Natural
Language
Processing,
Second
Edi<on,
Eds:
N.
 Indurkhya
and
F.
J.
Damerau),
CRC
Press,
Taylor
and
Francis
Group,
Boca
Raton,
FL.
Chapter
28.


•  Mar<n,
J.R.
and
White,
P.
R.
R.,
2005.
The
Language
of
Evalua<on,
Appraisal
in
English,
Palgrave
Macmillan,
London
&
 New
York.
•  Taboada,
M.,
Brooke,
J.,
Tofiloski,
M.,
Voll,
K.D.,
Stede,
M.,
2011.
Lexicon‐based
methods
for
sen<ment
analysis.
 Computa<onal
Linguis<cs
37(2),
267–307.
•  Tang,
H.,
Tan,
S.,
Cheng,
X.,
2009.
A
survey
on
sen<ment
detec<on
of
reviews.
Expert
Systems
with
Applica<ons
36(7),
 10760
–
10773.
•  Whitelaw,
C.;
Garg,
N.
and
Argamon,
S.,
2005.
Using
appraisal
groups
for
sen<ment
analysis.
In
Proceedings
of
the
14th
 ACM
interna<onal
conference
on
Informa<on
and
knowledge
management
(CIKM
05).
ACM,
New
York,
NY,
USA,
625‐631.
•  Wilson,
T.,
2008.
Fine‐Grained
Subjec<vity
Analysis.
PhD
Disserta<on,
Intelligent
Systems
Program,
University
of
 Pi/sburgh.
•  Wilson,
T.,
Wiebe,
J.,
Hoffmann,
P.,
2009.
Recognizing
contextual
polarity:
An
explora<on
of
features
for
phrase‐level
 sen<ment
analysis.
Computa<onal
Linguis<cs
35,
399–433.
•  Y.
Wu,
F.
Wei,
S.
Liu,
N.
Au,
W.
Cui,
H.
Zhou,
and
H.
Qu,
2010.
OpinionSeer:
Interac<ve
Visualisa<on
of
Hotel
Customer
 Feedback.
IEEE
Transac<ons
on
Visualiza<on
and
Computer
Graphics,
6,
1109‐1118.
Nov‐Dec.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 53

  54. 54. Open‐source
sen<ment‐analysis
tools
•  Python
NLTK
(Natural
Language
Toolkit)
–  h/p://www.nltk.org
and
h/p://text‐processing.com/demo/sen<ment
•  R,
TM
(text
mining)
module
h/p://cran.r‐project.org/web/packages/tm/index.html

•  RapidMiner
h/p://rapid‐i.com/content/view/184/196/

•  GATE,
the
General
Architecture
for
Text
Engineering
h/p://gate.ac.uk/sen<ment

•  UIMA‐plug‐in
annotators
for
sen<ment
—
Apache
UIMA
is
the
 Unstructured
Informa<on
Management
Architecture,
h/p://uima.apache.org/


•  SenJment
classifiers
for
the
WEKA
data‐mining
workbench,
 h/p://www.cs.waikato.ac.nz/ml/weka/.

•  Stanford
NLP
tools
‐
h/p://www‐nlp.stanford.edu/soware
maximum‐entropy
 classifica<on
approach
for
sen<ment.
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 54

  55. 55. Thank
you
very
much
for
your
 a/en<on!!
 Ques<ons
Apr‐18‐12
 Marcirio
Chaves
‐
marcirioc@uatlan<ca.pt
 55


×