TREND
TREND 
DETECTION, TRACKING & TRANSITION 
in Social Networks 
1. Definition & General Idea 
2. Web Samples in Trend Hunting 
3. Detection Approches 
4. Architecture: TwitterMonitor 
5. Detection: MemeTracker 
6. Classification: ExoEndo 
SemioNet: Semantic Social Network Analysis
REFERENCES 
Mathioudakis, Michael, and Nick Koudas. "Twittermonitor: trend 
detection over the twitter stream." Proceedings of the 2010 ACM 
SIGMOD International Conference on Management of data. ACM, 
2010. 
Leskovec, Jure, Lars Backstrom, and Jon Kleinberg. "Meme-tracking 
and the dynamics of the news cycle." Proceedings of the 15th ACM 
SIGKDD international conference on Knowledge discovery and 
data mining. ACM, 2009. 
Naaman, Mor, Hila Becker, and Luis Gravano. "Hip and trendy: 
Characterizing emerging trends on Twitter." Journal of the American 
Society for Information Science and Technology 62.5 (2011): 902- 
918. 
Becker, Hila, Mor Naaman, and Luis Gravano. "Beyond Trending 
Topics: Real-World Event Identification on Twitter." ICWSM 11 (2011): 
438-441.
Trend Analysis 
The Science of Studying 
Changes in Social Patterns, 
Including Fashion, Technology 
& Consumer Behavior 
Horizontal Analysis 
The General Movement 
over TIME of a 
Statistically Detectable 
Change 
Fundamentally, a Method 
for Understanding HOW & 
WHY Things have Changed 
– or will Change – over TIME
APPLICATION
APPROCH 
Text Mining 
Topic Ident. & Clust. 
"Kilroy was here" was a 
piece of graffiti that 
became popular in the 
1940s, and existed under 
various names in 
different countries, 
illustrating how a meme 
can be modified through 
replication 
Memes 
(/ˈmiːm/) is "an idea, behavior, or 
style that spreads from person to person 
within a culture.“ … through writing, 
speech, gestures, rituals, or other 
imitable phenomena with a mimicked 
theme. … cultural analogues to genes in that 
they self-replicate, mutate, and 
respond to selective pressures.
GroupBurst: Assesses Co-occurrences 
One-pass 
of Bursty 
Real-time 
Keyword in Recent Tweets 
Adjustable against spam 
Theoretically sound! 
Adjustable against SPURIOUS Bursts. Coincidental Burst of Keyword over a short period of time 
Context Extraction Algorithms (PCA, 
SVD) & Grapevine’s Entity Extractor 
to Add more 
271 Million Monthly Active Users 
500 Million Tweets (140 ch) Per Day 
78% Active Users on Mobile 
77% Accounts Outside U.S. 
Supports 35+ languages
MemeTracking 
News Cycle 
Tracking News Evolution 
Quotes & Memes 
Integral Part of Journalistic Practice 
Travel Relatively Intact with Mutational Variants 
Clustering by Graph
Item: Each News Article/Blog Post 
Phrase: A Quoted String Occurs in Items 
MemeTracking …
Phrase Graph 
DAG 
|P| < |Q| 
“senseless killing” 
“enough of senseless 
killing” 
“Hear our voice. We have had enough of this 
senseless killing” 
Directed Edit Distance(P, Q) < δ 
Word Consecutive Overlap(P, Q) > k 
P  Q 
푊푃,푄 ∝ 
1 
퐷푖푟푒푐푡푒푑 퐸푑푖푡 퐷푖푠푡푎푛푐푒(푃,푄) 
∝ 푇표푡푎푙 푁푢푚푏푒푟 표푓 푄 푖푛 퐶표푟푝푢푠 
MemeTracking …
Phrase Clusters 
Directed Acyclic Graph (DAG) Partitioning 
Given a Weighted DAG, Delete a Set of Edges of 
Min Total Weight So That Each of the Resulting 
Components is Single-Rooted. 
NP-hard 
Heuristic 
1.Start from the Roots 
2.Down the DAG & greedily Assigns each Node to the Cluster to 
which it has the most Edges 
MemeTracking …
MemeTracking …
Result 
Volume Distribution 
Dataset 
3 Months Aug 1 to Oct 31 2008 
~ 1M Docs per Day from 1.65 Million 
Sites! 
47M Phrases, 22M Distinct 
9H Clustering Process Time 
35, 800 Non-trivial Clusters (at least two phrases) 
MemeTracking …
ThemeRiver 
MemeTracking …
Other Findings 
Time lag between the news media and blogs 
푓 푛푗 훿 푡 − 푡푗 
푛푗 = Number of Item Previously Written for Cluster j 
푡 = 푡ℎ푒 푐푢푟푟푒푛푡 푡푖푚푒 
푡푗 = 푡ℎ푒 푡푖푚푒 푤ℎ푒푛 푗 푤푎푠 푓푖푟푠푡 푝푟표푑푢푐푒푑 
푅푒푐푒푛푐푦 → 훿 푖푠 푚표푛표푡표푛푖푐푎푙푙푦 푑푒푐푟푒푎푠푖푛푔 푖푛 푡 − 푡푗 
퐼푚푖푡푎푡푖표푛 → 푓 푖푠 푚표푛표푡표푛푖푐푎푙푙푦 푖푛푐푟푒푎푠푖푛푔 푖푛 푛푗, 푓(0) > 0 
푡 → 0−: 푎 = 0.076 푡 → 0+: 푎 = 0.092 
푡 → 0−: 푏 = 1.77 푡 → 0+: 푏 = 2.15 
Quotes migrating from blogs to news media: 3.5% 
Each Cluster 
Modeling the news trend 
Imitation≠Recency 
MemeTracking …
Characterizing Trends 
“trends in trend data.”  Meta Trend 
Taxonomy of the trends 
Key Distinguishing Features of Trends 
Not only the Textual Content 
Social Network Structure 
Ties 
Geographic 
Action  Retweet, Reply, Mention, Hashtag
Trends 
Exogenous 
Broadcast-media 
Broadcast of local media 
“fight” (boxing event) 
“Ravens” (football game) 
Broadcast of global/national media 
“Kanye”(KanyeWest acts up at the MTVVideo MusicAwards) 
“Lost Finale” (series finale of Lost). 
Global News 
Breaking 
“earthquake” (Chile earthquake) 
“Tsunami” (HawaiiTsunamiwarning) 
“Beyoncé”(Beyoncé cancels Malaysia concert). 
Nonbreaking 
“HCR” (health care reform) 
“Tiger” (Tiger Woods apologizes) 
“iPad” (toward thelaunch of Apple’s popular device). 
National Holidays & Memorial Days 
“Halloween,” “Valentine’s.” 
Local Participatory & Physical 
Planned 
“marathon,” 
“superbowl” (Super Bowl viewing parties) 
“patrick’s” (St. Patrick’s Day Parade). 
Unplanned 
“rainy,” “snow.” 
Endogenous 
Memes 
#in2010 (in December 2009, users imagine their near future) 
“November” (users marking the beginning of the month on November 1) 
Retweets 
Fan Community Activities 
“2pac” (the anniversary of the death of hip-hop artist Tupac Shakur). 
Characterizing Trends …
Trends from twitter.com 
Trends from Simple Trend Detector 
Trends for Quality Analysis  Supervised Categories 
Trends for Computing Features 
Tquantity 
Ttwitter 
Tterm freq. 
Tquality 
Characterizing Trends …
Content Features 
•Average number of words/characters 
•Proportion of messages with URLs, unique URLs, with hashtags ex/including trend terms 
•Top unique hashtag? 
•Similarity to centroid 
Interaction Features 
• Proportion of retweets, replies, mentions 
Time-based Features 
• Exponential fit head, tail 
• Logarithmic fit head, tail 
Participation Features 
• Messages per author 
• Proportion of messages from top author 
• Proportion of messages from top 10% of authors 
Social Network Features 
•Level of reciprocity 
•Maximal eigenvector centrality 
•Maximal degree centrality 
•Transitivity 
•Density 
•Average component size 
Characterizing Trends …
Content features: Exo higher URLs, smaller hashtags 
Exogenous 
vs. 
Endogenous 
Trends 
Interaction features: Exo fewer 
retweets, similar number of replies 
Time features: Exo different for the 
head period before the trend peak 
but will exhibit similar time features in 
the tail period after the trend peak, 
compared to endogenous trends. 
Social network features: Exo fewer connections, less reciprocity 
1.1 
1.2 
1.3 
1.4 
Characterizing Trends …
TRANSITION 
Alluvial Diagrams
IDEA 
Automatic Categorization of Trends 
Photography Trend  Selfie Image 
Trust Trend  Trustful Users, Trustful Twits 
Untrendy People! Users Counteract the trends

Trend Analysis

  • 1.
  • 2.
    TREND DETECTION, TRACKING& TRANSITION in Social Networks 1. Definition & General Idea 2. Web Samples in Trend Hunting 3. Detection Approches 4. Architecture: TwitterMonitor 5. Detection: MemeTracker 6. Classification: ExoEndo SemioNet: Semantic Social Network Analysis
  • 3.
    REFERENCES Mathioudakis, Michael,and Nick Koudas. "Twittermonitor: trend detection over the twitter stream." Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. ACM, 2010. Leskovec, Jure, Lars Backstrom, and Jon Kleinberg. "Meme-tracking and the dynamics of the news cycle." Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2009. Naaman, Mor, Hila Becker, and Luis Gravano. "Hip and trendy: Characterizing emerging trends on Twitter." Journal of the American Society for Information Science and Technology 62.5 (2011): 902- 918. Becker, Hila, Mor Naaman, and Luis Gravano. "Beyond Trending Topics: Real-World Event Identification on Twitter." ICWSM 11 (2011): 438-441.
  • 4.
    Trend Analysis TheScience of Studying Changes in Social Patterns, Including Fashion, Technology & Consumer Behavior Horizontal Analysis The General Movement over TIME of a Statistically Detectable Change Fundamentally, a Method for Understanding HOW & WHY Things have Changed – or will Change – over TIME
  • 5.
  • 6.
    APPROCH Text Mining Topic Ident. & Clust. "Kilroy was here" was a piece of graffiti that became popular in the 1940s, and existed under various names in different countries, illustrating how a meme can be modified through replication Memes (/ˈmiːm/) is "an idea, behavior, or style that spreads from person to person within a culture.“ … through writing, speech, gestures, rituals, or other imitable phenomena with a mimicked theme. … cultural analogues to genes in that they self-replicate, mutate, and respond to selective pressures.
  • 7.
    GroupBurst: Assesses Co-occurrences One-pass of Bursty Real-time Keyword in Recent Tweets Adjustable against spam Theoretically sound! Adjustable against SPURIOUS Bursts. Coincidental Burst of Keyword over a short period of time Context Extraction Algorithms (PCA, SVD) & Grapevine’s Entity Extractor to Add more 271 Million Monthly Active Users 500 Million Tweets (140 ch) Per Day 78% Active Users on Mobile 77% Accounts Outside U.S. Supports 35+ languages
  • 8.
    MemeTracking News Cycle Tracking News Evolution Quotes & Memes Integral Part of Journalistic Practice Travel Relatively Intact with Mutational Variants Clustering by Graph
  • 9.
    Item: Each NewsArticle/Blog Post Phrase: A Quoted String Occurs in Items MemeTracking …
  • 10.
    Phrase Graph DAG |P| < |Q| “senseless killing” “enough of senseless killing” “Hear our voice. We have had enough of this senseless killing” Directed Edit Distance(P, Q) < δ Word Consecutive Overlap(P, Q) > k P  Q 푊푃,푄 ∝ 1 퐷푖푟푒푐푡푒푑 퐸푑푖푡 퐷푖푠푡푎푛푐푒(푃,푄) ∝ 푇표푡푎푙 푁푢푚푏푒푟 표푓 푄 푖푛 퐶표푟푝푢푠 MemeTracking …
  • 11.
    Phrase Clusters DirectedAcyclic Graph (DAG) Partitioning Given a Weighted DAG, Delete a Set of Edges of Min Total Weight So That Each of the Resulting Components is Single-Rooted. NP-hard Heuristic 1.Start from the Roots 2.Down the DAG & greedily Assigns each Node to the Cluster to which it has the most Edges MemeTracking …
  • 12.
  • 13.
    Result Volume Distribution Dataset 3 Months Aug 1 to Oct 31 2008 ~ 1M Docs per Day from 1.65 Million Sites! 47M Phrases, 22M Distinct 9H Clustering Process Time 35, 800 Non-trivial Clusters (at least two phrases) MemeTracking …
  • 14.
  • 15.
    Other Findings Timelag between the news media and blogs 푓 푛푗 훿 푡 − 푡푗 푛푗 = Number of Item Previously Written for Cluster j 푡 = 푡ℎ푒 푐푢푟푟푒푛푡 푡푖푚푒 푡푗 = 푡ℎ푒 푡푖푚푒 푤ℎ푒푛 푗 푤푎푠 푓푖푟푠푡 푝푟표푑푢푐푒푑 푅푒푐푒푛푐푦 → 훿 푖푠 푚표푛표푡표푛푖푐푎푙푙푦 푑푒푐푟푒푎푠푖푛푔 푖푛 푡 − 푡푗 퐼푚푖푡푎푡푖표푛 → 푓 푖푠 푚표푛표푡표푛푖푐푎푙푙푦 푖푛푐푟푒푎푠푖푛푔 푖푛 푛푗, 푓(0) > 0 푡 → 0−: 푎 = 0.076 푡 → 0+: 푎 = 0.092 푡 → 0−: 푏 = 1.77 푡 → 0+: 푏 = 2.15 Quotes migrating from blogs to news media: 3.5% Each Cluster Modeling the news trend Imitation≠Recency MemeTracking …
  • 17.
    Characterizing Trends “trendsin trend data.”  Meta Trend Taxonomy of the trends Key Distinguishing Features of Trends Not only the Textual Content Social Network Structure Ties Geographic Action  Retweet, Reply, Mention, Hashtag
  • 18.
    Trends Exogenous Broadcast-media Broadcast of local media “fight” (boxing event) “Ravens” (football game) Broadcast of global/national media “Kanye”(KanyeWest acts up at the MTVVideo MusicAwards) “Lost Finale” (series finale of Lost). Global News Breaking “earthquake” (Chile earthquake) “Tsunami” (HawaiiTsunamiwarning) “Beyoncé”(Beyoncé cancels Malaysia concert). Nonbreaking “HCR” (health care reform) “Tiger” (Tiger Woods apologizes) “iPad” (toward thelaunch of Apple’s popular device). National Holidays & Memorial Days “Halloween,” “Valentine’s.” Local Participatory & Physical Planned “marathon,” “superbowl” (Super Bowl viewing parties) “patrick’s” (St. Patrick’s Day Parade). Unplanned “rainy,” “snow.” Endogenous Memes #in2010 (in December 2009, users imagine their near future) “November” (users marking the beginning of the month on November 1) Retweets Fan Community Activities “2pac” (the anniversary of the death of hip-hop artist Tupac Shakur). Characterizing Trends …
  • 19.
    Trends from twitter.com Trends from Simple Trend Detector Trends for Quality Analysis  Supervised Categories Trends for Computing Features Tquantity Ttwitter Tterm freq. Tquality Characterizing Trends …
  • 20.
    Content Features •Averagenumber of words/characters •Proportion of messages with URLs, unique URLs, with hashtags ex/including trend terms •Top unique hashtag? •Similarity to centroid Interaction Features • Proportion of retweets, replies, mentions Time-based Features • Exponential fit head, tail • Logarithmic fit head, tail Participation Features • Messages per author • Proportion of messages from top author • Proportion of messages from top 10% of authors Social Network Features •Level of reciprocity •Maximal eigenvector centrality •Maximal degree centrality •Transitivity •Density •Average component size Characterizing Trends …
  • 21.
    Content features: Exohigher URLs, smaller hashtags Exogenous vs. Endogenous Trends Interaction features: Exo fewer retweets, similar number of replies Time features: Exo different for the head period before the trend peak but will exhibit similar time features in the tail period after the trend peak, compared to endogenous trends. Social network features: Exo fewer connections, less reciprocity 1.1 1.2 1.3 1.4 Characterizing Trends …
  • 22.
  • 23.
    IDEA Automatic Categorizationof Trends Photography Trend  Selfie Image Trust Trend  Trustful Users, Trustful Twits Untrendy People! Users Counteract the trends

Editor's Notes

  • #5 Vertical Analysis: Financial Managers Set One Accounting Item as the Benchmark & Compare other Items with the Numerical Standard In contrast with Horizontal Analysis: Study of Performance Trends over Time Short Intermediate Long Past Now Future
  • #8 Automatic trend detection over the twitter stream
  • #9 distinctive phrases that travel relatively intact through on-line text; developing scalable algorithms for clustering textual variants of such phrases, we identify a broad class of memes that exhibit wide spread mutation. As a result, a central computational challenge in this approach is to find robust ways of extracting and identifying all the mutational variants of each of these distinctive phrases, and to group them together.
  • #11 Words as Tokens This latter dependence is important, since we particularly wish to preserve edges (p, q) when the inclusion of p in q is supported by many occurrences of q.
  • #12 Collections of Phrases Deemed to be Close Textual Variants of One Another
  • #14 CCDF: Complementary Cumulative Distribution Function If the quantity of interest is power-law distributed with exponent γ, p(x) ∝ x−γ, then when plotted on log-log axes the CCDF will be a straight line with slope −(γ + 1). the tail is much heavier This means that variants of popular phrases, like “lipstick on a pig,” are much more “stickier” than what would be expected from overall phrase volume distribution. Popular phrases have many variants and each of them appears more frequently than an “average” phrase.
  • #15 To put a “lipstick on a pig”(does not make it a lady) is a rhetorical expression used to convey the message that making superficial or cosmetic changes is a futile attempt to disguise the true nature of a product اگر زري بپوشي، اگر اطلس بپوشي، همون کنگر فروشي بزک
  • #16 focus on the 1,000 threads with the largest total volumes (i.e. the largest number of mentions). Thread volume in blogs reaches its peak typically 2.5 hours after the peak thread volume in the news sources. Thread volume in news sources increases slowly but decrease quickly, while in blogs the increase is rapid and decrease much slower.
  • #18 reflect an ever-updating real-time live image of our society.
  • #19 Exogenous Trends • Broadcast-media events: ◦ Broadcast of local media events: “fight” (boxing event), “Ravens” (football game). ◦ Broadcast of global/national media events: “Kanye”(KanyeWest acts up at the MTVVideo MusicAwards),“Lost Finale” (series finale of Lost). • Global news events: ◦ Breaking news events: “earthquake” (Chile earthquake),“Tsunami” (HawaiiTsunamiwarning), “Beyoncé”(Beyoncé cancels Malaysia concert). ◦ Nonbreaking news events: “HCR” (health care reform),“Tiger” (Tiger Woods apologizes), “iPad” (toward thelaunch of Apple’s popular device). • National holidays and memorial days: “Halloween,” “Valentine’s.” • Local participatory and physical events: ◦ Planned events: “marathon,” “superbowl” (Super Bowl viewing parties), “patrick’s” (St. Patrick’s Day Parade). ◦ Unplanned events: “rainy,” “snow.” Endogenous Trends • Memes: #in2010 (in December 2009, users imagine their near future), “November” (users marking the beginning of the month on November 1) • Retweets (users “forwarding” en masse a single tweet from a popular user): “determination” (users retweeting LL Cool J’s post about said concept). • Fan community activities: “2pac” (the anniversary of the death of hip-hop artist Tupac Shakur).
  • #22 Breaking News vs. Other Exogenous Trends H2.1: Interaction features of breaking events will be different than those of other exogenous trends, with more retweets (forwarding), but fewer replies (conversation). H2.2: Time features of breaking events will be different for the head period, showing more rapid growth, and a better fit to the functions’ curve (i.e., less noise) compared to other exogenous trends. H2.3: Social network features of breaking events will be different than those of other exogenous trends. Local Events vs. Other Exogenous Trends H3.1: Content features of local events will be different than those of other exogenous trends. H3.2: Interaction features of local events will be different than those of other exogenous trends; in particular, local events will have more replies (conversation). H3.3: Time features of local events will be different than those of other exogenous trends. H3.4: Social network features of local events will be different than those of other exogenous trends; in particular, local events will have denser networks, more connectivity, and higher reciprocity. Memes vs. Retweet Endogenous Trends H4.1: Content features of memes will be different than those of retweet trends. H4.2: Interaction features of memes will be different than those of retweet trends; in particular, retweet trends will have significantly more retweet (forwarding) messages (this hypothesis is included as a “sanity check” since the retweet trends are defined by having a large proportion of retweets). H4.3: Time features of memes will be different than those of retweet trends. H4.4: Participation features of memes will be different than those of retweet trends. H4.5: Social network features of memes will be different than those of retweet trends; in particular, meme trends will have more connectivity and higher reciprocity than retweet trends.