TweetProbe
A
Real-Time
Microblog
Stream
Visualization
Byungkyu Kang, George Legrady and Tobias Höllerer
FourEyes Lab
Unive...
Motivation

• Microblogs are Valuable Source
-

To selectively consume news and information

-

Difficult to assess and co...
Research Question

• Novel visualization design on real-time microblog stream
-

How to effectively visualize transient tr...
Related Work

• Social Stream Filtering and Detection
-

‘TwitterMonitor’ takes user feedback (Mathioudakis and Koudas 201...
System
Architecture
Back-end Data Processing
Front-end Visualization Layer
System Architecture
BACK-END DATA PROCESSING

Twitter Streaming API

UDP

Real-time Tweet
Stream

#Query

temp temp
RT TW
...
Design
Considerations
4 View Modes
Sentiment
Map

Retweet
Count
Ranking

Emerging
Retweet
Ranking

Hashtag
Ranking
Design - Motivation

• Rain Drops
-

Stream of Information : Flow in a Continuous Medium (Stream)
Message : Discontinuous ...
Logarithmic Timeline

The Histomap of
Evolution
The former logarithmic timeline visualization of geologic
and human histor...
Visualization

Sentiment Map
Emerging Retweet Ranking
Retweet Count Ranking
Hashtag Ranking
Sentiment Map

• Rain-drop Like Message Visualization
-

Tweet = Rain drop + Circular Wave

-

# Follower Defines Drop Siz...
Real-time Ranking Visualization

EMERGING
RETWEET
Transient emerging tweets

TOP-COUNT
RETWEET
Top retweets in CDF

EMERGI...
Emerging vs Top Retweets (1)
#msg

A

• A shows n/∆t
t

B

Easy to detect transient trending message

-

Time

-

Cumulati...
Emerging Retweet Ranking

• Top N emerging retweets
-

n/∆t : Number of binned RT within a
time-window

• Sliding animatio...
Retweet Count Ranking

• Top N retweets
-

∑n : Shows top RTs (RT counts in CDF)

-

Shows only alive retweets

• Incoming...
Hashtag Ranking

• Top N hashtags
-

Emerging topics of messages
Text size is mapped with
its ratio of hashtag count
Example: #royalbaby
#royalbaby
22nd of July, 2013

-

900,000 hashtags
(#royalbaby)

-

25,300 Tweets/min at peak
More tha...
TweetProbe

Conclusion
Conclusion

• Novel visualization design
-

Real-time visualization for trending microposts and topics
Conceptualize ‘Cont...
Research Question Revisited

• Novel visualization design on real-time microblog stream
-

How to effectively visualize tr...
Thank You!

Come and try TweetProbe
@Art Program
Today at 6pm
A705
Upcoming SlideShare
Loading in …5
×

Ieee visap bkang

357 views

Published on

TweetProbe : A Real-Time Microblog Stream Visualization Framework

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
357
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
1
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Good afternoon everyone. I am Byungkyu Kang from UC Santa Barbara.I'm here today to talk about our paper, 'TweetProbe' - a real-time microblog stream visualization framework.Before beginning my presentation, I would like to appreciate everyone for coming here to listen to my talk.
  • As the importance of social network increases in our daily life, its data become valuable source for various practices. For example, many companies use social media to find consumer patterns on their products or services.
    What about the individual users? They also need to selectively consume information that they want or need.
    Then, what kind of issues do we encounter in those tasks?
    Since we can find enormous amount of data on various social media, we have scalability issue. Given the limited resource and time we have, it is very difficult to assess and comprehend the data that we have.
    Moreover, most of the topics come and out very quickly. Therefore, we need to timely find the temporal topics. Here we call them 'emerging topics or transient messages'.
  • Then, what can we do to deal with these challenging problems?
    With the streaming service, we have access to huge amount of messages in real-time. And, we thought about harnessing this resource with effective visualization framework in order for users find emerging information or news easily with given keyword of interest.
    In this paper, we propose a novel visualization design on real-time microblog stream. The main research question is 'How can we effectively visualize transient trending topics?'
  • Here's some related work of our approach.
    Since our system has two different layers, we studied the literature in two parts.
    First, Social Stream Filtering and Detection which is what our back-end data processing does.
    -‘twitterMonitor’: QueueBurst algorithm to detect bursty keywords from the tweet texts. However, TweetProbe framework detects bursty retweets focusing extensively on ‘retweeting behavior’ instead of keyword-based pattern. However, we also believe that we can find the bursty trends through the 4th view mode, which detects trending hashtags.
    -Milicic et al. developed storyboard based shared media curation approach. They developed a framework which collects microposts shared on social platforms that contain media items as a result of a query. Their visual storyboard shows results as a stream of images, clusters or timeline of items.
    -Esparza et al. proposed a system called 'Catstream' which is a user profiling approach based on the topical categorization of users’ posted URLs.
    Second, Real-time Visualization
    -In real-time visualization, most of the research in the literature have been focusing on network intrusion detection (IDS) or infrastructure monitoring. Since timely alerts are a crucial factor in these systems, they visualize essential status information in a very simple visual structure.
    -On the other hand, Kamvar and Harris developed a web-based framework called 'We feel fine' which visualizes emotional web. This system does process blog postings in back-end, extract sentiment of contents and visualize them with aesthetic circular components with various colors. However, it is not fully real-time approach.
    -As another example, a recent web-based monitoring service called Tweetping was created by Paris-based developer Franck Ernewein. This service is the most recent real-time microblog visualization which shows number of messages being updated in each country over the world. This visualization tells us the distribution of messages in real-time regardless of topic or category of message.
  • Next, I want to talk briefly about the system architecture of the framework.
  • -As Twitter provides streaming API, we can access to the micro-postings in real-time. Through the network connection, we receive incoming data in JSON format as they arrive and parse them to get the message and its metadata.
    -Once each message is parsed, it goes through a filter with a given query. In this process, we only store matching messages into a cache memory and discard others. This loop is being maintained while we keep the connection to Twitter.
    -As new data comes into the cache memory, they are passed to an array and sorted simultaneously. Regular message and retweet are being treated differently.
    -When each message is stored as an object, text and metadata are interpreted to extract sentiment and other information.
    -Up to this point, the back-end data processing layer take in charge.
    -While the back-end process works, front-end visualization layer runs independently, updating new incoming message on the screen.
  • Now I'm going to talk about the design consideration of the framework.
  • We have four different view modes: Sentiment Map, Emerging retweet ranking, Retweet count ranking and Hashtag ranking.
    //
    Except the first sentiment map, the other 3 view modes show the real-time ranking visualization. I’ll talk more about this later..
  • Our design is inspired by the rain drops.
    When observing the message-posting behavior on the stream, we could see that / each message arrives irregularly. This random distribution of messages is analogous with that of the rain drops.
    On a rainy day, one could see the rain drops falling here and there, / making different size of circular waves (on the puddle). We made an analogy / between the size of the waves and the influencing power of each user. / Therefore, the various sizes of the waves in our design represent / differences in dissemination ability of each message.
    Here / we interpret the stream of microposts / as a flow in a continuous medium / and each message as a discontinuous element in a flow.
  • In this framework, / we apply the lógarithmic timeline / in order to show the original posting time of each message / with focusing more on the recent events.
    Through the log-scale timeline approach, / both old and new messages can be seen together, / showing more detail of the latest ones.
    This visualization technique was first introduced by the Histomap of Evolution by Sparks in 1932.
  • Now let’s look at the individual view modes of our visualization.
  • The first view is dubbed as sentiment map. / Here, each tweet is depicted as a rain drop / along with its circular wave animation.
    Since the number of followers of each user is considered / as the potential of its dissemination power in the network, / we mapped it to the size of drop and the duration time of circular animation.
    Each drop is also spatially mapped to the grid on the screen / and it is color-coded / according to one's sentiment score. / Both red and blue colors represent / positive and negative sentiments.
  • The following three view modes are the real-time ranking visualizations. /
    In this visualization, / sliding animation and lógarithmic timeline are the two main visual components. / Sliding animation reveals the emerging retweets / and lógarithmic timeline shows the freshness of each message.
  • Due to the similarity of the two retweet rankings, I am going to talk about how they are different.
    Here we have two different graphs.
    The graph A shows the number of incoming messages on each time frame. Given a unit time 'delta t', we can see the variation of retweet counts of each message.
    On the other hand, the graph B shows the same data in a different way. Since it is described as a cumulative distribution function, as time passes, we can see how many times each message has been retweeted since its birth.
    As you might notice here, the graph A is a derivative of the graph B. Our emerging retweet ranking visualization is equivalent to the graph A and the retweet count ranking visualization is after the graph B.
  • The emerging retweet ranking view shows the top N emerging retweets. / Default time-window is set to 10 minutes, / but it can be modified by a user.
    Through the sliding animation, / users can see / how the new emerging retweets take place in real-time.
  • On the contrary, / in the retweet count ranking, / messages are sorted based on the total retweet counts. / Note that here it shows only active retweets / since we are harnessing the data currency of information stream.
  • Lastly, the hashtag ranking.
    In this visualization, / we show the top N hashtags in the given time window, / varying the size of each item. / The size of both text and sliding box is mapped to the ratio of hashtag count in the rank. / Since the hashtag is considered as the topic of each message, / we can interpret this as the trending topic.
  • In this example, / we can see a possible scenario of the application of TweetProbe system.
    We have been monitoring the emerging retweets talking about the royal baby on 22nd of July. / As can be seen here, / people have been sharing different postings on Twitter as time moves on. / For example, on 8:40pm, / the official information began to spread out in the network / with detailed (information.) numbers // such as the time of birth and the weight of the baby.
    Again, on 9:00pm, / people start sharing a photo / with a link containing the official source of the announcement.
    This tells us that emerging topic or message on microblogs can be replaced with another every second.
  • Conclusion
  • TweetProbe is a real-time visualization framework, which is carefully designed for visualizing trending microposts and topics in real-time.
    In visual language, it tries to conceptualize 'contínuum of discontinúity' using metaphoric visual components such as rain drops.
    The four different visualization techniques comprise various components such as sentiment visualization in a binary color-scale, logarithmic time and sliding animation.
  • Getting back to the research question, / we thought about how to effectively visualize transient trending topics.
    To answer to this question, / we have proposed a multi-thread based visualization technique for social stream and how to identify emerging messages and sentiments out of massive amount of data in real-time.
    Also we have shown our new visual components.
  • You can also see and try this visualization at the Art Show exhibition.
    We welcome you to the Art Show Opening Session today at 6pm, room A705.
    Thank you.
  • Ieee visap bkang

    1. 1. TweetProbe A Real-Time Microblog Stream Visualization Byungkyu Kang, George Legrady and Tobias Höllerer FourEyes Lab University of California Santa Barbara
    2. 2. Motivation • Microblogs are Valuable Source - To selectively consume news and information - Difficult to assess and comprehend enormous data - Majority of contents in microblogs are transient topics To analyze social dynamics • Scalability Issue • Temporal Topics
    3. 3. Research Question • Novel visualization design on real-time microblog stream - How to effectively visualize transient trending topics?
    4. 4. Related Work • Social Stream Filtering and Detection - ‘TwitterMonitor’ takes user feedback (Mathioudakis and Koudas 2010) - Network Intrusion Detection (Cyber-Security Situational Awareness) Storyboard based Shared Media Curation (Milicic et al. 2013) ‘Catstream’ employes user profiling approach (Esparza et al. 2013) • Real-time Visualization ‘Tweetping’ Geo-spatial mapping of real-time messages (http://tweeping.net) ‘We feel fine’ visualizes emotional web (Kamvar and Harris, 2011)
    5. 5. System Architecture Back-end Data Processing Front-end Visualization Layer
    6. 6. System Architecture BACK-END DATA PROCESSING Twitter Streaming API UDP Real-time Tweet Stream #Query temp temp RT TW Cache Loop JSON Parsing Extract Metadata StatusListener FRONT-END VISUALIZATION UPDATE RT TW SORT Sentiment Map Storage Text / Metadata Processing Emerging Retweet Ranking Burst Detection Retweet Count Ranking Sentiment Extraction Top Hashtag Ranking
    7. 7. Design Considerations
    8. 8. 4 View Modes Sentiment Map Retweet Count Ranking Emerging Retweet Ranking Hashtag Ranking
    9. 9. Design - Motivation • Rain Drops - Stream of Information : Flow in a Continuous Medium (Stream) Message : Discontinuous Element in a Flow “Continuum of Discontinuity” Wall paper image is an exerpt from http://www.paqoo.com
    10. 10. Logarithmic Timeline The Histomap of Evolution The former logarithmic timeline visualization of geologic and human history, by John B. Sparks (1932) Logarithmic Timeline The logarithmic time in TweetProbe shows the original posting time of messages focusing more on recent events
    11. 11. Visualization Sentiment Map Emerging Retweet Ranking Retweet Count Ranking Hashtag Ranking
    12. 12. Sentiment Map • Rain-drop Like Message Visualization - Tweet = Rain drop + Circular Wave - # Follower Defines Drop Size and Duration Time - Each Drop is Spatially Mapped to Grid • Potential of Dissemination • Visual Mapping Color-coded Sentiment Score
    13. 13. Real-time Ranking Visualization EMERGING RETWEET Transient emerging tweets TOP-COUNT RETWEET Top retweets in CDF EMERGING HASHTAGS Transient emerging #hashtags • Main Visual Components - Sliding animation to reveal emerging retweets. Logarithmic timeline to show the ‘freshness’ of messages
    14. 14. Emerging vs Top Retweets (1) #msg A • A shows n/∆t t B Easy to detect transient trending message - Time - Cumulative Distribute Function (CDF) of (a) Emerging Retweet Ranking #msg • B shows ∑n Time t Retweet Count Ranking
    15. 15. Emerging Retweet Ranking • Top N emerging retweets - n/∆t : Number of binned RT within a time-window • Sliding animation shows transition in rank • Color-coded with markers in timeline
    16. 16. Retweet Count Ranking • Top N retweets - ∑n : Shows top RTs (RT counts in CDF) - Shows only alive retweets • Incoming RTs in real-time
    17. 17. Hashtag Ranking • Top N hashtags - Emerging topics of messages Text size is mapped with its ratio of hashtag count
    18. 18. Example: #royalbaby #royalbaby 22nd of July, 2013 - 900,000 hashtags (#royalbaby) - 25,300 Tweets/min at peak More than 2 million mentions of the news https://blog.twitter.com/en-gb/2013/royalbaby-0
    19. 19. TweetProbe Conclusion
    20. 20. Conclusion • Novel visualization design - Real-time visualization for trending microposts and topics Conceptualize ‘Continuum of Discontinuity’ Metaphoric visual components such as rain drops Color-coded sentiment visualization Logarithmic timeline with sliding animation Catch transient emerging topics using 3 ranking view modes
    21. 21. Research Question Revisited • Novel visualization design on real-time microblog stream - How to effectively visualize transient trending topics? Multi-thread based visualization Identify emerging messages / hashtags with sentiments New visual components (rain drop and sliding window)
    22. 22. Thank You! Come and try TweetProbe @Art Program Today at 6pm A705

    ×