Web & Social Media Analystics - Workshop Semantica

Part 1WARM UP
(10 min)
 Exercise
 Ask for a volunteer
 Inside the box, ask the V. to look for 1 specific item
 Bend the V.’s eyes and ask her to find similar items, retrieve and put them in
different places depending on their similarity

Part 1WARM UP - EXPLANATION
 What does it all mean?
 What you have just met is the problem the your computer faces:
 If you ask it to “find” the item that you need, it will do: “find” actually
means “match what I give you with what you have in your db”
 What your computer is not able to do is to put similar things
together or to separate the different ones
 Or better, it’s not able to make new categories which include similar items
 In other words, topics ;)

But why isn’t your computer able to make up such new categories?
The answer is pretty straight… Because it does not know what those objects are and “mean”
Everything your computer sees in a text is a series of characters. So, in a sentence like
“Roberto is having great fun in this workshop!”,
the thing that your computer sees is actually just…
“Xxxxxx zv gdatdin dhdp3 axnwbx sdxn hwbxwbx xbwxhbjwx!”
That is, it just does not know what those sequences of chars stand for.
And the only things that he can put together are ”similar shapes”
That’s why you need to tag...
Part 1WARM UP - EXPLANATION

PART 2
WHAT IS “SEMANTICS”
AND WHAT IS IT FOR A COMPUTER?

THERE ARE MANY TYPES OF MEANING
Semantics is meaning. But first of all, let’s broaden the
meaning of what “meaning” means :P
Actually, we should better talk with plurals, ie. meanings.
There are several types of meanings, and each one
depends on the purpose of the communication (or better,
communicative action)

Some examples of types of meaning could be:
 Referents, labels, relations
 Events
 Text cohesion
Units of analysis
User intentions
 Context: implications and consequences
Homo Sapiens (or just people, ehe) can understand all these types of meanings, and many more.
THERE ARE MANY TYPES OF MEANING

BUT WHAT CAN A COMPUTER UNDERSTAND?
A lot, for sure. But a computer does not have (yet) the knowledge about the
world that a human being has gained since she was born. By “knowledge of
the world” we mean all of possible information registered in many ways,
from biological perception to cultural education and social habits.
if this all sounds so far away from your need to make a tool work, think like this:
there are so many implied and underlying meanings in the text your tool is
processing that it just does not know anything about. That’s why you need
to cover its lacks of knowledge.

REFERENTS, LABELS AND RELATIONS
a REFERENT is the OBJECT : a person, a company, an event (oh, btw, these are also the
SmartThemes in TalkWalker!).
A Referent could be sticked with more than just one LABEL: a name is a label for a person, or
company, or even an event (eg. a title for a conference). For example, the curly guy writing
these slides is named Roberto. And he’s also Digital Analyst. And he’s also the guy travelling
from and to Bergamo every day. These are all ways (labels) that could be used to refer to the
same Referent (object) ALTERNATIVELY.
This is important to your computer (and you behind it; btw, why aren’t you sitting in the front?)
because the writer of a text could refer to the same object in many ways, and you could miss
out some results because you didn’t set up those keyword in your query.

REFERENTS, LABELS AND RELATIONS
And then there are SYNONYMS, ie. things that are similar and thus they sometimes occur in
the same contexts, close to one another. Unfortunately, a computer doesn’t know there 2
words are synonyms, unless you instruct it that they are. Thus, 2 or more words could have a
RELATION of synonymy (or antonymy, to put it veeeery simply) and belong to the same
SEMANTIC AREA.

Referents, labels, relations
Entities
 Have attributes and features and are involved in events
 Could be referred to with different labels
 Are in relations with other similar entities, which have
different names, but which sometimes are used in their
place

BUT WHAT CAN A COMPUTER UNDERSTAND?
First of all, a computer lacks all of this knowledge
about the world and the language. This is why tech
giants are building it.
Schema.org
Google’s Knowledge graph

EVENTS
Event structures
Frames
Thematic roles

EVENT STRUCTURES
 Give me 10 words to describe this
situation: what’s going on?

EVENT STRUCTURES - FRAMES
 The basic idea is that one cannot understand the meaning of a single word without
access to all the essential knowledge that relates to that word.
 For example, one would not be able to understand the word "sell" without knowing
anything about the situation of commercial transfer, which also involves, among
other things, a seller, a buyer, goods, money, the relation between the money
and the goods, the relations between the seller and the goods and the money, the
relation between the buyer and the goods and the money and so on.
 Thus, a word activates, or evokes, a frame of semantic knowledge relating to the
specific concept it refers to (or highlights, in frame semantic terminology)

EVENT STRUCTURES – THEMATIC ROLES

EVENT STRUCTURES
So, in events there are
PEOPLE DOING THINGS TO OTHER PEOPLE, maybe WITH SOME THING
So far, so good.
Imagine if you could relate the entities you identify to the actions they take.
Imagine if your computer could do that….
And indeed, there are projects working on the description of events. Ever heard about Framenet?

UNDERSTADING TEXTS
Understanding a text is definitely much more than plain reading it (or its
“graphical shapes”). Especially when it comes to relate it to other texts and
make content and meaningful collections, ie. Gathering into topics
In order to have a computer understand even the flatter meaning of a sentence
(not even a text) we would need at least
 A dictionary: to provide for linguistic information (eg. Grammatical meta-
data)
 An ontology: to relate entities and frame them into events structures
This is the future of Semantic Web. But this one is already another story…

USER’S INTENTIONS
 What is an “intention” ?
 It’s a form of meaning, ie. Pragmatic. Something that is
present in one’s mind
Intentions are made of both the motivations to take an action
and the result the one wants to achieve by that action
Intentions show the background where the motivation was
formed (eg. Emotionally) and the direction where the user’s
attention is heading

 In search: how does a search engine satisfy the query with a broader
scope?
 Uses a dictionary (with synonyms, variants, etc. all included in the
algorithm): the case of Google’s broad match type
 In social networks:
 Posts, comments, shares: which one counts more?
 A scale of “original” content
 why did Fb introduce reactions?
 To give a limited and predefinite set of emotions beyond the Like button
 Are they a source for sentiment analysis? (yes, to count in social media analytics, with
breakdown)
USER’S INTENTIONS

And what about text? How can a computer understand the intention of a (written or spoken)
text?
IT IS A HUUUUUUUUUUUUGGGEEE QUESTION
Most tech giants are hardly working on this, making great efforts in developing algorithms and
computing systems. Here’s an excerpt from the Microsoft Speech Technology Dept.
 Intent understanding is about identifying the action a user wants a computer to take or the
information she/he would like to obtain, conveyed in a spoken utterance or a text query.
USER’S INTENTIONS

USER’S INTENTIONS
 … and which way are they of interest to ORM?
 Identifying the background and the direction of an intention may provide a “path” of
action. Which could potentially be a pattern (in which it could be possible to intervene)
 words can be ambiguous (not clearly connoted, or not used for their straight meaning,
eg. Irony). Identifying the intention of use can help attributing the best fitting sentiment
 Computers don’t have (yet) the ability to “sense” the overall mood of a situation
 Considering the pragmatic dimension of intention (and context, later) broadens the
perspective of ORM beyond keywords and metrics, and can help writing more significant
insights

Context: implications and consequences
 Context meaning can be conceived of as the meaning that is received by the people (more
or less) independently of the speaker’s intentions
 Or said differently, the effects that a communicative act brings about within the environment
that it gets into
 Shares on Social networks: a case of “mute meaning”
 Intention is to amplify the attention on the news, to make it own, to show support
 If no comments are added, shares are examples of how a content spreads the consequences of the
original content

PART 3 SENTIMENT ANALYSYS
 Units of analysis for sentiment attribution
 Word?
 Sentences?
 document?
 Discourse?
 Topic / Theme?
 Data-driven approach
 Exc. 2
 Top-down?
 Bottom-up?

WHAT IS AN OPINION? ABSTRACTION 1
 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen
is really cool. The voice quality is clear too. It is much better than my old
Blackberry, which was a terrible phone and so difficult to type with its tiny
keys. However, my mother was mad with me as I did not tell her before I
bought the phone. She also thought the phone was too expensive, ...” (Liu, Ch.
in NLP handbook, 2010)
 One can look at this review/blog at the
 document level, i.e., is this review + or -?
 sentence level, i.e., is each sentence + or -? entity and feature/aspect level

Entity and aspect/feature level
 “I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool.
The voice quality is clear too. It is much better than my old Blackberry, which was a terrible
phone and so difficult to type with its tiny keys. However, my mother was mad with me as I
did not tell her before I bought the phone. She also thought the phone was too expensive,
...”
 What do we see?
 Opinion targets: entities and their features/aspects
 Sentiments: positive and negative
 Opinion holders: persons who hold the opinions
 Time: when opinions are expressed

OPINION LOGIC STRUCTURE
An opinion is a quintuple
(ej, ajk, soijkl, hi, tl)
where
 ej is a target entity.
 ajk is an aspect/feature of the entity ej.
 soijkl is the sentiment value of the opinion from the opinion holder hi on aspect ajk of entity
ej at time tl. soijkl is positive, negative, or neutral, or a more granular rating.
 hi is an opinion holder.
 tl is the time when the opinion is expressed.
Opinion definition (Liu, Ch. in NLP handbook, 2010)

HOW TO USE THIS OPINION LOGIC STRUCTURE?
With this logic, it’s possible to face the issue to structure the unstructured
 Goal: Given an opinionated document
 we can discover all quintuples (ej, ajk, soijkl, hi, tl),
 Or, solve some simpler forms of the problem; E.g., sentiment classification at the document or
sentence level.
 With the quintuples, it’s possible to convert unstructured Text to structured Data
 Traditional data and visualization tools can be used to slice, dice and visualize the results
 However, as seen in the logic structure, tools need to have dictionaries and ontologies built-in
 It is then possible to enable qualitative and quantitative analysis

OPINION SUMMARY (ABSTRACTION 2)
With a lot of opinions, a summary is necessary.
 It’s a multi-document summarization task
 For factual texts, summarization is to select the most important facts and present them in a
sensible order while avoiding repetition
 1 fact = any number of the same fact
 But for opinion documents, it is different because opinions have a quantitative side & have
targets
 1 opinion = a number of opinions
 Aspect-based summary is more suitable
 quintuples form the basis for opinion summarization

Aspect-based opinion summary
(Hu & Liu, 2004)
““I bought an iPhone a few days ago. It is such a nice phone. The touch screen is really cool. The voice quality is clear
too. It is much better than my old Blackberry, which was a terrible phone and so difficult to type with its tiny keys.
However, my mother was mad with me as I did not tell her before I bought the phone. She also thought the phone was
too expensive, ...”
Feature Based Summary of iPhone:
Feature1: Touch screen
Positive: 212
 The touchscreen was really cool.
 The touch screen was so easy to use and can do amazing things.
...
Negative: 6
 The screen is easily scratched.
 I have a lot of difficulty in removing finger marks from the touch screen.
...
Feature2: voice quality
…

ASPECT-BASED OPINION SUMMARY
This approach seems to be more suitable also for ORM purposes
because the variety and fragmentation of target objects is extremely
wide when it comes to summerize the reputation of products and
people
Indeed, it allows to breakdown more aspects of a object and to
assess them

AN EXAMPLE: APP2CHECK
 https://app2check.finsa.it/webapp/app/login.html

DATA-DRIVEN APPROACH
 Exc 2.
 Think of today’s presentation’s parts
 On a colored sticky notes, write the part that you like the most (green), medium (yellow) and
the least (red)
 Parts were
 Topic grouping
 Semantics
 Sentiment and opinions

 Top-down?
 The one that we currently use
 We identify some documents and assign them some sentiment
 It’s a ”fake bottom-up approach” because we can’t read all documents (for limited time and
resources)
 Bottom-up?
 a more fine-grained approach: sentiment tagging at word, sentence, or document level?
 App2check tags at sentences level. Then calculates the average sentiment of all sentences, and assigns it to the
document (single review)
 The document sentiment is compared and matched against the user’s rating, as a control measure
 Topics are also rated: topics are identified as keywords within the sentences that are opinionated, and get
calculated on average
DATA-DRIVEN APPROACH

Web & Social Media Analystics - Workshop Semantica

Recommended

Recommended

More Related Content

Similar to Web & Social Media Analystics - Workshop Semantica

Similar to Web & Social Media Analystics - Workshop Semantica (20)

Recently uploaded

Recently uploaded (20)

Web & Social Media Analystics - Workshop Semantica

Editor's Notes