SlideShare a Scribd company logo
Discovering and Navigating Memes
               in Social Media
                              Matt Lease
                         School of Information
                      University of Texas at Austin
                        ml@ischool.utexas.edu
                              @mattlease


                            Joint Work with
                    Hohyon Ryu & Nicholas Woodward


Paper to appear at HyperText 2012: 23rd ACM Conference on Hypertext and Social Media
April 3, 2012   SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   2
Critical Reading (Literacy)
      • Context-awareness (how work is situated)
                – Related works, Time/Place, Author…
      • Recognizing & questioning
                – Sources of Influence
                – Positions, Assumptions, Bias, …
      • New challenges online
                – Scale, authorship, citing of sources, borrowing…
      • Traditional approach: education
April 3, 2012       SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   3
Inspiration #1: Living Stories




                     livingstories.googlelabs.com
April 3, 2012    SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   4
Memes
• Similar phrases found across multiple sources
      – Includes multiple phrasings of same idea
• Re-use reveals implicit network
      – Sources, Individuals, Communities
      – Patterns of re-use reinforce links
• Questions
      – Re-use?
      – Intended re-use?
      – Visible (quoted)?
April 3, 2012   SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   5
Inspiration #2: Meme Tracker




April 3, 2012    SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   6
Where Repeated Text Occurs
      • Intended Re-use
                – Visible (Quotation): “to be or not to be”
                    • Leskovec et al., KDD’09 ( memetracker.org )
                – Hidden: e.g. plagiarism, false plurality
                – Unmarked
                    •   Near-Duplicate documents
                    •   Boilerplate: All rights reserved
                    •   Common adage: …a penny saved…
                    •   Style, genre, laziness, …
      • Accidental borrowing
      • Shared context (e.g. named entities)
                – E.g. named-entities: S. Skiena et al., Stony Brook ( textmap.com )
      • Chance (e.g. …then he said…)
April 3, 2012           SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   7
Data
      • TREC Blogs08 Collection
                – http://ir.dcs.gla.ac.uk/test_collections/blogs08info.html
                – 28M permalinks (January 2008 – January 2009)
                – 250G compressed
      • ICWSM 2009 Spinn3r Blog Dataset
                – http://www.icwsm.org/data/
                – 44 million blog posts (August - September, 2008)
                – 27 GB compressed
      • ICWSM 2011 Spinn3r Blog Dataset

April 3, 2012       SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   8
Inspiration #3: Popular Passages
      • Kolak & Schilit, HyperText’08
      • Find re-use in scanned books
                – Find repeated phrases
                – Group related phrases
                – Rank passages
                – MapReduce processing architecture
      • Browsing interface with generated links
      • Issues: data/task, locality, details, scalability
April 3, 2012       SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   9
Processing Architecture
                                                                               Blogs08 Test Collection
                                                                                  28M posts, 1.4TB
                Preprocessing (Pseudo-MapReduce)
                Decruft & Language Identification
                HTML Strip & Near-Duplicate Detection                            16M posts, 960GB



                Common Phrase Extraction
                                                                                  15K posts, 43GB
                3 MapReduce Stages

                Common Phrase Ranking
                Daily Top 200 Phrases                                            6.2M phrases, 2GB
                1 MapReduce Process

                Common Phrase Clustering
                                                                                75K phrases, 2.6MB
                1 MapReduce Process

                Meme Browser                                                        68K memes



April 3, 2012        SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   10
Meme Browser




April 3, 2012   SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   11
Efficiency: Meme Clustering



 • From WEKA ARFF format to sparse representation
       – From ~96 hours  11 hours
 • Indexed vs. un-indexed
       – From 11 hours  16 minutes (single core)
       – From 34 minutes  3 minutes (136 cores)
 • Distributed vs. single core
       – From 11 hours  34 minutes (un-indexed)
       – From 16 minutes  3 minutes (indexed)
April 3, 2012   SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction   12
Thank You!
Joint Work with                 Matt Lease
– Hohyon (Will) Ryu             ml@ischool.utexas.edu
– Nicholas Woodward             www.ischool.utexas.edu/~ml
                                  @mattlease



                                Support
                                • FCT of Portugal / UT CoLab
                                • Amazon Web Services
Meme Browser:                   • UT Austin LIFT Award
odyssey.ischool.utexas.edu/mb   • John P. Commons Fellowship

More Related Content

Similar to Discovering and Navigating Memes in Social Media

Discovering Memes in Social Media
Discovering Memes in Social MediaDiscovering Memes in Social Media
Discovering Memes in Social Media
Matthew Lease
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
ekansa
 
MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11
Rafael Alvarado
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
sjwoodman
 
Hany's Doctoral Consortium
Hany's Doctoral ConsortiumHany's Doctoral Consortium
Hany's Doctoral Consortium
heinestien
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
Jon Voss
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
Debanjan Mahata
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
Dr.-Ing. Thomas Hartmann
 
Text Mining : Experience
Text Mining : ExperienceText Mining : Experience
Text Mining : Experience
Boonlert Aroonpiboon
 
Semantic engagement handouts
Semantic engagement handoutsSemantic engagement handouts
Semantic engagement handouts
STIinnsbruck
 
New Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAMENew Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAME
SharonYang
 
Hany's JCDL Doctoral Consortium
Hany's JCDL Doctoral ConsortiumHany's JCDL Doctoral Consortium
Hany's JCDL Doctoral Consortium
heinestien
 
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
Marcia Zeng
 
Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012
RENDER project
 
Ir1
Ir1Ir1
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
Talis Consulting
 
Transitive credit
Transitive creditTransitive credit
Transitive credit
Daniel S. Katz
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
Amit Sheth
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
CS, NcState
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
Prateek Jain
 

Similar to Discovering and Navigating Memes in Social Media (20)

Discovering Memes in Social Media
Discovering Memes in Social MediaDiscovering Memes in Social Media
Discovering Memes in Social Media
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11
 
CSC 8101 Non Relational Databases
CSC 8101 Non Relational DatabasesCSC 8101 Non Relational Databases
CSC 8101 Non Relational Databases
 
Hany's Doctoral Consortium
Hany's Doctoral ConsortiumHany's Doctoral Consortium
Hany's Doctoral Consortium
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
An Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4jAn Introduction to NOSQL, Graph Databases and Neo4j
An Introduction to NOSQL, Graph Databases and Neo4j
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
Text Mining : Experience
Text Mining : ExperienceText Mining : Experience
Text Mining : Experience
 
Semantic engagement handouts
Semantic engagement handoutsSemantic engagement handouts
Semantic engagement handouts
 
New Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAMENew Directions in Information Organization: A Linked Data Model with BIBFRAME
New Directions in Information Organization: A Linked Data Model with BIBFRAME
 
Hany's JCDL Doctoral Consortium
Hany's JCDL Doctoral ConsortiumHany's JCDL Doctoral Consortium
Hany's JCDL Doctoral Consortium
 
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
 
Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012Text Stream Processing Tutorial @WIMS 2012
Text Stream Processing Tutorial @WIMS 2012
 
Ir1
Ir1Ir1
Ir1
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
Transitive credit
Transitive creditTransitive credit
Transitive credit
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 

More from Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
Matthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
Matthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Matthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
Matthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
Matthew Lease
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
Matthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
Matthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
Matthew Lease
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Matthew Lease
 

More from Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 

Recently uploaded

Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
ldtexsolbl
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
HackersList
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
digitalxplive
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Zilliz
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
313mohammedarshad
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
Priyanka Aash
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
The importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT StandardizationThe importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT Standardization
Axel Rennoch
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
CEPTES Software Inc
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Networks
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 

Recently uploaded (20)

Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
WhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring AppsWhatsApp Spy Online Trackers and Monitoring Apps
WhatsApp Spy Online Trackers and Monitoring Apps
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
The Rise of AI in Cybersecurity How Machine Learning Will Shape Threat Detect...
 
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and OllamaTirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
The importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT StandardizationThe importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT Standardization
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Salesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot WorkshopSalesforce AI & Einstein Copilot Workshop
Salesforce AI & Einstein Copilot Workshop
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 
IPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite SolutionIPLOOK Remote-Sensing Satellite Solution
IPLOOK Remote-Sensing Satellite Solution
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 

Discovering and Navigating Memes in Social Media

  • 1. Discovering and Navigating Memes in Social Media Matt Lease School of Information University of Texas at Austin ml@ischool.utexas.edu @mattlease Joint Work with Hohyon Ryu & Nicholas Woodward Paper to appear at HyperText 2012: 23rd ACM Conference on Hypertext and Social Media
  • 2. April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 2
  • 3. Critical Reading (Literacy) • Context-awareness (how work is situated) – Related works, Time/Place, Author… • Recognizing & questioning – Sources of Influence – Positions, Assumptions, Bias, … • New challenges online – Scale, authorship, citing of sources, borrowing… • Traditional approach: education April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 3
  • 4. Inspiration #1: Living Stories livingstories.googlelabs.com April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 4
  • 5. Memes • Similar phrases found across multiple sources – Includes multiple phrasings of same idea • Re-use reveals implicit network – Sources, Individuals, Communities – Patterns of re-use reinforce links • Questions – Re-use? – Intended re-use? – Visible (quoted)? April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 5
  • 6. Inspiration #2: Meme Tracker April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 6
  • 7. Where Repeated Text Occurs • Intended Re-use – Visible (Quotation): “to be or not to be” • Leskovec et al., KDD’09 ( memetracker.org ) – Hidden: e.g. plagiarism, false plurality – Unmarked • Near-Duplicate documents • Boilerplate: All rights reserved • Common adage: …a penny saved… • Style, genre, laziness, … • Accidental borrowing • Shared context (e.g. named entities) – E.g. named-entities: S. Skiena et al., Stony Brook ( textmap.com ) • Chance (e.g. …then he said…) April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 7
  • 8. Data • TREC Blogs08 Collection – http://ir.dcs.gla.ac.uk/test_collections/blogs08info.html – 28M permalinks (January 2008 – January 2009) – 250G compressed • ICWSM 2009 Spinn3r Blog Dataset – http://www.icwsm.org/data/ – 44 million blog posts (August - September, 2008) – 27 GB compressed • ICWSM 2011 Spinn3r Blog Dataset April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 8
  • 9. Inspiration #3: Popular Passages • Kolak & Schilit, HyperText’08 • Find re-use in scanned books – Find repeated phrases – Group related phrases – Rank passages – MapReduce processing architecture • Browsing interface with generated links • Issues: data/task, locality, details, scalability April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 9
  • 10. Processing Architecture Blogs08 Test Collection 28M posts, 1.4TB Preprocessing (Pseudo-MapReduce) Decruft & Language Identification HTML Strip & Near-Duplicate Detection 16M posts, 960GB Common Phrase Extraction 15K posts, 43GB 3 MapReduce Stages Common Phrase Ranking Daily Top 200 Phrases 6.2M phrases, 2GB 1 MapReduce Process Common Phrase Clustering 75K phrases, 2.6MB 1 MapReduce Process Meme Browser 68K memes April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 10
  • 11. Meme Browser April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 11
  • 12. Efficiency: Meme Clustering • From WEKA ARFF format to sparse representation – From ~96 hours  11 hours • Indexed vs. un-indexed – From 11 hours  16 minutes (single core) – From 34 minutes  3 minutes (136 cores) • Distributed vs. single core – From 11 hours  34 minutes (un-indexed) – From 16 minutes  3 minutes (indexed) April 3, 2012 SBP 2012: Intl. Conf. on Social Computing, Behavioral-Cultural Modeling, & Prediction 12
  • 13. Thank You! Joint Work with Matt Lease – Hohyon (Will) Ryu ml@ischool.utexas.edu – Nicholas Woodward www.ischool.utexas.edu/~ml @mattlease Support • FCT of Portugal / UT CoLab • Amazon Web Services Meme Browser: • UT Austin LIFT Award odyssey.ischool.utexas.edu/mb • John P. Commons Fellowship