Discovering and Navigating Memes               in Social Media                              Matt Lease                    ...
April 3, 2012   SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   2
Critical Reading (Literacy)      • Context-awareness (how work is situated)                – Related works, Time/Place, Au...
Inspiration #1: Living Stories                     livingstories.googlelabs.comApril 3, 2012    SBP 2012: Intl. Conf. on S...
Memes• Similar phrases found across multiple sources      – Includes multiple phrasings of same idea• Re-use reveals impli...
Inspiration #2: Meme TrackerApril 3, 2012    SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Pr...
Where Repeated Text Occurs      • Intended Re-use                – Visible (Quotation): “to be or not to be”              ...
Data      • TREC Blogs08 Collection                – http://ir.dcs.gla.ac.uk/test_collections/blogs08info.html            ...
Inspiration #3: Popular Passages      • Kolak & Schilit, HyperText’08      • Find re-use in scanned books                –...
Processing Architecture                                                                               Blogs08 Test Collect...
Meme BrowserApril 3, 2012   SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   11
Efficiency: Meme Clustering • From WEKA ARFF format to sparse representation       – From ~96 hours  11 hours • Indexed v...
Thank You!Joint Work with                 Matt Lease– Hohyon (Will) Ryu             ml@ischool.utexas.edu– Nicholas Woodwa...
Upcoming SlideShare
Loading in …5
×

Discovering and Navigating Memes in Social Media

1,664 views

Published on

Invited talk at SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction (April 3, 2012). Based on paper by Ryu, Lease, and Woodward, to appear at ACM HyperText 2012. Joint work with Hohyon Ryu and Nicholas Woodward.

Published in: Technology, Education
1 Comment
4 Likes
Statistics
Notes
No Downloads
Views
Total views
1,664
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
0
Comments
1
Likes
4
Embeds 0
No embeds

No notes for slide

Discovering and Navigating Memes in Social Media

  1. 1. Discovering and Navigating Memes in Social Media Matt Lease School of Information University of Texas at Austin ml@ischool.utexas.edu @mattlease Joint Work with Hohyon Ryu & Nicholas WoodwardPaper to appear at HyperText 2012: 23rd ACM Conference on Hypertext and Social Media
  2. 2. April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 2
  3. 3. Critical Reading (Literacy) • Context-awareness (how work is situated) – Related works, Time/Place, Author… • Recognizing & questioning – Sources of Influence – Positions, Assumptions, Bias, … • New challenges online – Scale, authorship, citing of sources, borrowing… • Traditional approach: educationApril 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 3
  4. 4. Inspiration #1: Living Stories livingstories.googlelabs.comApril 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 4
  5. 5. Memes• Similar phrases found across multiple sources – Includes multiple phrasings of same idea• Re-use reveals implicit network – Sources, Individuals, Communities – Patterns of re-use reinforce links• Questions – Re-use? – Intended re-use? – Visible (quoted)?April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 5
  6. 6. Inspiration #2: Meme TrackerApril 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 6
  7. 7. Where Repeated Text Occurs • Intended Re-use – Visible (Quotation): “to be or not to be” • Leskovec et al., KDD’09 ( memetracker.org ) – Hidden: e.g. plagiarism, false plurality – Unmarked • Near-Duplicate documents • Boilerplate: All rights reserved • Common adage: …a penny saved… • Style, genre, laziness, … • Accidental borrowing • Shared context (e.g. named entities) – E.g. named-entities: S. Skiena et al., Stony Brook ( textmap.com ) • Chance (e.g. …then he said…)April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 7
  8. 8. Data • TREC Blogs08 Collection – http://ir.dcs.gla.ac.uk/test_collections/blogs08info.html – 28M permalinks (January 2008 – January 2009) – 250G compressed • ICWSM 2009 Spinn3r Blog Dataset – http://www.icwsm.org/data/ – 44 million blog posts (August - September, 2008) – 27 GB compressed • ICWSM 2011 Spinn3r Blog DatasetApril 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 8
  9. 9. Inspiration #3: Popular Passages • Kolak & Schilit, HyperText’08 • Find re-use in scanned books – Find repeated phrases – Group related phrases – Rank passages – MapReduce processing architecture • Browsing interface with generated links • Issues: data/task, locality, details, scalabilityApril 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 9
  10. 10. Processing Architecture Blogs08 Test Collection 28M posts, 1.4TB Preprocessing (Pseudo-MapReduce) Decruft & Language Identification HTML Strip & Near-Duplicate Detection 16M posts, 960GB Common Phrase Extraction 15K posts, 43GB 3 MapReduce Stages Common Phrase Ranking Daily Top 200 Phrases 6.2M phrases, 2GB 1 MapReduce Process Common Phrase Clustering 75K phrases, 2.6MB 1 MapReduce Process Meme Browser 68K memesApril 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 10
  11. 11. Meme BrowserApril 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 11
  12. 12. Efficiency: Meme Clustering • From WEKA ARFF format to sparse representation – From ~96 hours  11 hours • Indexed vs. un-indexed – From 11 hours  16 minutes (single core) – From 34 minutes  3 minutes (136 cores) • Distributed vs. single core – From 11 hours  34 minutes (un-indexed) – From 16 minutes  3 minutes (indexed)April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 12
  13. 13. Thank You!Joint Work with Matt Lease– Hohyon (Will) Ryu ml@ischool.utexas.edu– Nicholas Woodward www.ischool.utexas.edu/~ml @mattlease Support • FCT of Portugal / UT CoLab • Amazon Web ServicesMeme Browser: • UT Austin LIFT Awardodyssey.ischool.utexas.edu/mb • John P. Commons Fellowship

×