The document discusses using semantics and social network analysis to detect signals from real-time social web data, such as citizen reports on events, in order to understand social perceptions and extract insights. It presents examples of tools like Twitris that analyze tweets about news events and the BBC SoundIndex that tracks online music popularity. The research aims to better understand how social media can be used as a proxy for tracking popular opinions, events, and buying preferences.
1. Detecting Signals from Real-time Social Web
Semantic Social Networking Panel @ STC 2010
June 24, 2010
Amit Sheth
Kno.e.sis, Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, OH
Thanks - Meena Nagarajan, Kno.e.sis
2. Our Approach
• Semantics of ‘Semantic Social Networking’
• Bottom-up and top-down
• Statistical semantics powered by domain model
semantics
• Social Networks of Interest
• Not the friend/peer/co-author network
• Event/topic oriented dynamic networks
3. Dynamic Social Networks: Citizen
Journalism, Online Communities..
http://www.telegraph.co.uk/news/worldnews/asia/india/3530640/Mumbai-attacks-Twitter-and-Flickr-used-to-break-news-Bombay-
India.html
5. Other Areas of Focus
WHAT “I decided to check out Wanted demo today even though
I really did not like the movie”
“It was THE HANGOVER of the year..lasted forever.. so I
went to the movies..bad choice picking “GI Jane” worse
now”
WHAT: Named entity recognition, topics..
6. Other Areas of Focus
“Looking for a cheap body shop mechanic in Dayton
WHAT WHY OH” - Transactional
“Check out these links..” - Information Sharing
“Where can I find a good psp cam” - Information
Seeking
WHAT: Named entity recognition, topics..
WHY: User intent identification ...
7. Other Areas of Focus
Male: “I graduated in '04 from USC... now working in
Austin... I like stuff, and i like doing stuff. What stuff do
you like to do?”
WHAT WHY
Female: “Well Im a pretty easy going person. Love the
outdoors and going camping, boating, fishing, short
weekend trips,the horseraces, drag races, hanging out at
HOW home, doing yard work,or just watching movies or having
BBQ's with friends.”
WHAT: Named entity recognition, topics..
WHY: User intent identification ...
HOW: Word usages and an active population..
8. Other Areas of Focus
WHAT (NER): “Context and Domain Knowledge Enhanced
Entity Spotting in Informal Text”, The 8th International
Semantic Web Conference, 2009
“A Measure of Extraction Complexity: a Novel Prior for
Improving Recognition of Cultural Entities”, Manuscript in
preparation
WHAT WHY
WHY (Intents): “Monetizing User Activity on Social Networks -
HOW Challenges and Experiences”, International Conference on
Web Intelligence, 2009
HOW: “An Examination of Language Use in Online Dating
Personals”, 3rd Int'l AAAI Conference on Weblogs and
Social Media, 2009
9. Sample showcases
Social Computing @ Kno.e.sis
• Social perceptions behind events : Twitris
http://twitris.knoesis.org
• Online popularity of music artists: BBC Sound
Index (IBM Almaden)
http://www.almaden.ibm.com/cs/projects/iis/sound/
10. http://twitris.knoesis.org/
TWITRIS
online pulse of a populace around news-worthy
events..
Mumbai terror attack, Health care debate ..
11. Chatter around news-worthy
events..
Hundreds of tweets, facebook posts, blogs about a single event
multiple narratives, strong opinions, breaking news..
12. TWITRIS : Twitter+Tetris
• WHAT are people saying, WHEN and from
WHERE
• Browse citizen reports using social perceptions
as the fulcrum
• Citizen reports in context by overlaying it
with Web articles!
13. What, When and Where:
The Power of Spatio-Temporal-
Thematic slices
18. Summaries of Citizen Reports
RT @WestWingReport: Obama reminds the faith-based
groups "we're neglecting 2 live up 2 the call" of being R
brother's keeper on #healthcare
19. Find resources related to
Find resources related to
social perceptions
2. Social Media in Context
social perceptions
SOYLENT GREEN and the HEALTH CARE REFORMand News and
News
Wikipedia articles
Information right where you need it ! Wikipedia articles
toto put extracted
put extracted
descriptors in
descriptors in
context
context
ws and
kipedia articles
put extracted
scriptors in
ntext
Cull
well
blog
!Exploit spatio, temporal semantics for thematic aggregation
Exploit spatio, temporal semantics for thematic aggregation
21. Spatial Aggregation
Assisted by a model of a domain/event...
!"#$%&''()*+,(-*&./01&23&/45670,(8)&9&0:&;6*)(-5/0
&776*)6<0/50!"#$%&'()037(./5160;=3+>>/*?4<>@ABCD0
E6F3&5<G0H/7&56'61I(50
!"#$%"&'()*+%,-"-./#,0012+*3/%,04.*05#,*6#+(7+80%,,*90#:0
8*3%;;+%,.-0#:0:#+<-+0=>?0%!60@#$60A-9*,3#,0#,0!"#$%&#'()*B0
?+%,02C;(DD/,EF+"G.#<DEHI6!880
!"#$%&'()*+%*+,'%*'!"#!$'-./011234/15%6787'9:;<='9:;<=>?>@AB=
9(C4<=D:E-FG'
!"#$%&'()*+,-.(&/&.*0#"(123&'04&2($#(
%1))&"(-"(!"#$%((51$*'216(78(91'(
:;'1"<,&.0#"((=4161%.""(
22. Twitris - A Village Effort!
We are very excited for what is to come!
Stay Tuned!
http://twitris.knoesis.org/
23. Things we are working on..
• Factual vs. Opinionated tweets
• Polarized opinions: what is breaking up a
community
• Joe Wilson: “You lie!”
• Personalized Tweets: what do people like me
think about X.
• Customizing it to events you want to track!
• Trust in Social Media & Content ...... and much more!
24. http://www.almaden.ibm.com/cs/projects/iis/sound/
http://www.almaden.ibm.com/cs/projects/iis/sound/
BBC SoundIndex (IBM Almaden)
Pulse of the Online Music Populace
Daniel Gruhl, Meenakshi Nagarajan, Jan Pieper, Christine Robson, Amit Sheth:
Multimodal Social Intelligence in a Real-Time Dashboard System to appear in a special issue of the VLDB Journal on "Data
Management and Mining for Social Networks and Social Media", 2010
25. The Vision ! Netizens do not always
buy their music, let alone
buy in a CD store.
http://www.almaden.ibm.com/cs/projects/iis/sound/
! Traditional sales figures
are a poor indicator of
music popularity.
• What is ‘really’ hot? • BBC SoundIndex - “A
pioneering project to tap into
• BBC: Are online music the online buzz surrounding
communities good artists and songs, by
leveraging several popular
proxies for popular
online sources”
music listings?!
26. “Multimodal Social Intelligence in a Real-Time
Dashboard System”, VLDB Journal 2010 Special Issue:
Data Management and Mining for Social Networks
and Social Media.
User metadata, unstructured,
Artist/Track structured attention
Metadata metadata
27. “Multimodal Social Intelligence in a Real-Time
Dashboard System”, VLDB Journal 2010 Special Issue:
Data Management and Mining for Social Networks
and Social Media.
Album/Track identification
Sentiment Identification
Spam and off-topic comments
UIMA Analytics Environment
28. “Multimodal Social Intelligence in a Real-Time
Dashboard System”, VLDB Journal 2010 Special Issue:
Data Management and Mining for Social Networks
and Social Media.
Exracted concepts into
explorable datastructures
29. “Multimodal Social Intelligence in a Real-Time
Dashboard System”, VLDB Journal 2010 Special Issue:
Data Management and Mining for Social Networks
and Social Media.
What are 18 year olds in London
listening to?
30. “Multimodal Social Intelligence in a Real-Time
Dashboard System”, VLDB Journal 2010 Special Issue:
Data Management and Mining for Social Networks
and Social Media.
What are 18 year olds in London
listening to?
Crowd-sourced preferences
31. The Word on the Street
Billboards Top 50 Singles chart during the week of Sept 22-28 ’07
vs. MySpace popularity charts
comments were spam Billboard.com MySpace Analysis
comments had positive sentiments
comments had negative sentiments Soulja Boy T.I.
comments had no identifiable sentiments Kanye West Soulja Boy
on Statistics Timbaland Fall Out Boy
Fergie Rihanna
J. Holiday Keyshia Cole
50 Cent Avril Lavigne
in Section 8, the structured metadata Keyshia Cole Timbaland
mestamp, etc.) and annotation results Nickelback Pink
m, sentiment, etc.) were loaded in the Pink 50 Cent
Colbie Caillat Alicia Keys
resented by each cell of the cube is the Table 8 Billboard’s Top Artists vs. our generated list
ents for a given artist. The dimension- Showing Top 10
e is dependent on what variables we
1 was comprised of respondents between ages 8
32. The Word on the Street
Billboards Top 50 Singles chart during the week of Sept 22-28 ’07
vs. MySpace popularity charts
comments were spam Billboard.com MySpace Analysis
comments had positive sentiments both
* Top artists appear in lists,
comments had Overlaps
Several negative sentiments Soulja Boy T.I.
comments had no identifiable sentiments Kanye West Soulja Boy
on Statistics Timbaland Fall Out Boy
* Predictive power of MySpace - Fergie Rihanna
Billboard next week looked a lot like J. Holiday Keyshia Cole
50 Cent Avril Lavigne
in MySpace this week.. metadata
Section 8, the structured Keyshia Cole Timbaland
mestamp, etc.) and annotation results Nickelback Pink
m, sentiment, etc.) were loaded in the Pink 50 Cent
Teenagers are big music influencers Colbie Caillat Alicia Keys
[MediaMark2004]
resented by each cell of the cube is the Table 8 Billboard’s Top Artists vs. our generated list
ents for a given artist. The dimension- Showing Top 10
e is dependent on what variables we
1 was comprised of respondents between ages 8
33. Powerful Proxies for
Popularity
• “Which list more accurately reflects the artists
that were more popular last week?”
• 75 participants
• Overall 2:1 preference for MySpace list
38% of total comments were spam Billboard.com MySpace Analysis
61% of total comments had positive sentiments
4% of total comments had negative sentiments
• Younger age groups: 6:1 (8-15 yrs)
35% of total comments
Table 7 Annotation Statistics
had no identifiable sentiments
Soulja Boy
Kanye West
Timbaland
T.I.
Soulja Boy
Fall Out Boy
Fergie Rihanna
J. Holiday Keyshia Cole
50 Cent Avril Lavigne
As described in Section 8, the structured metadata
Challenging traditional polling methods!
Keyshia Cole Timbaland
(artist name, timestamp, etc.) and annotation results Nickelback Pink
(spam/non-spam, sentiment, etc.) were loaded in the Pink 50 Cent
Colbie Caillat Alicia Keys
hypercube.
The data represented by each cell of the cube is the Table 8 Billboard’s Top Artists vs. our generated list
34. Details here..
Social Computing research at Kno.e.sis
http://knoesis.wright.edu/research/semweb/
projects/socialmedia/
Meena Nagarajan’s research on understanding user-
generated content
http://knoesis.wright.edu/researchers/meena/
35. Semantic Social Networking Panel @ STC 2010
• How can we use the Social Web to detect and observe signals from
real time social data?
• How to study diversity and change, identify patterns of interactions,
and extract insights
• What can we learn about social perceptions of real time events?
• Tools for visualization and analysis in space, time and theme
• Can social network analysis be trusted?
• Capturing social network content to track and analyze buyer
preferences, shopping experience, demographics, and other
characteristics that influence purchasing behavior
Editor's Notes
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
my research has focused on three different understanding challenges associated with ugc
all
with goals of adding structured to unstructured content
in each of these areas I have contributed specific algorithms and techniques, several of which are published efforts..
mention names of techniques
collaborations
the first work that i want to tell u about has been a joint collab with res at IBM over the last 2 years
It is a deployed social web application aimed at real-time analytics of music popularity using data from social networks - basically using crowd sourced social intelligence for business intel
BBC - a platform for ingesting content from popular online sources for music discussion to generate billboard like popularity .. except from user chatter
differs from traditional polling
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
there are two kinds of data that go into soundindex
one structured - here u r seeing the structured metadata artists
but this also includes - structured attention metadata - user listens, plays
second type - unstructured text
significant volume -> user attention to this space
Ingesting into a common format - fetch and process is separate
point polling along with ongoing verification with subject matter experts DJs
Top 45 - showing 10
however for SI we were interested in one dimensional lists
talk about ordering overlaps
Top 45 - showing 10
however for SI we were interested in one dimensional lists
talk about ordering overlaps
Top 45 - showing 10
however for SI we were interested in one dimensional lists
talk about ordering overlaps
We conclude that new opportunities for self expression on the web provide a more accurate place to gather data on what people are really interested in than tra- ditional methods. The even stronger results from the younger audience suggests that this trend is, if any- thing, accelerating.