• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Algorithms for the thematic analysis of twitter datasets
 

Algorithms for the thematic analysis of twitter datasets

on

  • 2,988 views

Finding themes in tweets using Non-negative Matrix and Tensor Factorization

Finding themes in tweets using Non-negative Matrix and Tensor Factorization

Statistics

Views

Total Views
2,988
Views on SlideShare
2,708
Embed Views
280

Actions

Likes
1
Downloads
0
Comments
0

6 Embeds 280

http://www.scoop.it 179
http://mappingonlinepublics.net 46
http://www.mappingonlinepublics.net 42
http://syntaxspectrum.com 8
http://a0.twimg.com 3
http://www.syntaxspectrum.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Algorithms for the thematic analysis of twitter datasets Algorithms for the thematic analysis of twitter datasets Presentation Transcript

    • Algorithms for the Thematic Analysis of Twitter Datasets Twitter: aneesha Email: aneesha.bakharia@gmail.com #comtech2011 Twitter Workshop Presented by: Aneesha Bakharia
    • Background
      • PhD Candidate at Faculty of Science and Technology, QUT
      • Research
        • Algorithms for Interactive Content Analysis
      Surveys Workshops Interviews Large Doc Collections Corpus Twitter Blog Comments
    • Types of Qualitative Content Analysis (Hsieh and Shannon, 2006) Concentrate on Summative and Conventional (Inductive) Coding Approach Study Begins With Derivation of Codes Algorithms Summative Keywords Keywords identified before and during analysis Unsupervised and semi-supervised algorithms: NMF , NTF LDA and traditional clustering algorithms. Conventional (Inductive) Observation Categories developed during analysis Directed (Deductive) Theory Categories derived from pre-existing theory prior to analysis Supervised classification algorithms: Support Vector Machines
    • Algorithms for Summative and Conventional Content Analysis
      • Non-negative Matrix Factorisation (Lee & Seung, 1999)
        • Simultaneous document (tweet) and word clustering
        • Parts based representation
        • Positive matrix decompositions
      • Non-Negative Tensor Factorisation
        • Additional dimension (time)
        • Ideal to see temporal changes in themes over time
    • Related Research
      • Non-negative Matrix and Tensor Factorisation for Discussion Tracking (Bader, Berry and Langville, 2009)
      • Discussion tracking in Enron email using PARAFAC (Bader and Berry, 2008)
      • FutureLens: Software for Text Visualization and Tracking (Shutt, Puretskiy and Berry, 2009)
    • Non-negative Matrix Factorisation
      • A ~ WH
      • Tweet 1
      • Tweet 2
      • Tweet 3
      Term-Tweet Matrix Specify No Themes (k) Features Matrix Weights Matrix Word 1 Word 2 Word n Tweet 1 1 0 2 Tweet 2 0 1 0 Tweet 3 0 1 1 Word 1 Word 2 Word n Theme 1 0.5 0 1 Theme 2 0 0.5 0 Theme 1 Theme 2 Tweet 1 1 0 Tweet 2 0 1 Tweet 3 0 1
    • Non-negative Matrix Factorisation Features Matrix Weights Matrix Theme 1 Theme 2 Word 1 Word 2 Word 2 Tweet 1 Tweet 1 Tweet 1 Word 1 Word 2 Word 3 Theme 1 0.5 0 1 Theme 2 0 0.5 0 Theme 1 Theme 2 Tweet 1 1 0 Tweet 2 0 1 Tweet 3 0 1
    • Applying NMF and LDA as Content Analysis aids
    • Non-negative Matrix Factorisation Tweet - Word Matrix Tweet – Author Matrix Word 1 Word 2 Word n Tweet 1 1 0 2 Tweet 2 0 1 0 Tweet 3 0 1 1 Word 1 Word 2 Word n Tweet Author 1 1 0 2 Tweet Author 2 0 1 0 Tweet Author 3 0 1 1
    • Algorithms for the Thematic Analysis of Tweets
      • Thematic Analysis with Non-negative Matrix Factorisation
        • Convert text to term-document matrix
        • NMF produces
          • word-theme matrix
          • theme-document matrix
          • Allows theme overlap
        • Need to specify number of themes (k)
          • Allows for interactivity
    • #OzChi Analysis – OzChi 2010 Conference
      • Theme 1 – Elizabeth Churchill Keynote @xeeliz, dance, hci, yahoo, double rainbow, keynote
      • Theme 2 – John Seely Brown Keynote @jseelybrown, world, extreme, learning
      • Theme 3 – 24 hr Student Challenge @bjkraal, vote, support, posters
      • Theme 4 – Get the conf iphone app @parisba, conference, iphone app
    • TreeCloud Analysis of #OzChi Create Treeclouds: http://www.lirmm.fr/~gambette/treecloud/
    • OzChi Abstracts (2006 – 2010) http://www.randomsyntax.com/2010/11/24/uncovering-research-themes-from-5-years-of-ozchi-conferences-2006-2010/
    • Non-negative Tensor Matrix Factorisation Tweet – Word - Time Matrix Month April Word 1 Word 2 Word n Tweet 1 1 0 2 Tweet 2 0 1 0 Tweet 3 0 1 1 March Word 1 Word 2 Word n Tweet 1 1 0 2 Tweet 2 0 1 0 Tweet 3 0 1 1 Feb Word 1 Word 2 Word n Tweet 1 1 0 2 Tweet 2 0 1 0 Tweet 3 0 1 1 Jan Word 1 Word 2 Word n Tweet 1 1 0 2 Tweet 2 0 1 0 Tweet 3 0 1 1
    • Non-negative Tensor Matrix Factorisation Nonnegative Tensor Factorization for Knowledge Discovery http://cisml.utk.edu/Seminars/2010/Berry.pdf CISML Seminar Series, Fall 2010, Michael W. Berry
      • Interactive Theme Explorer developed as part of research
      Algorithms for the Thematic Analysis of Tweets
      • Interactive Theme Explorer developed as part of research
      • Plan to Integrate with yourTwapperKeeper (Business Intelligence)
      • Share datasets and analysis
      Algorithms for the Thematic Analysis of Tweets
      • Python & Java
      • Algorithms
        • NMF: http://www.csie.ntu.edu.tw/~cjlin/nmf/
        • NMF-LIB: http://code.google.com/p/nmflib/
        • Latent Dirichlet Allocation (LDA) Apache Mahout: http://mahout.apache.org/
        • WEKA http://www.cs.waikato.ac.nz/ml/weka/
      Toolkit
    • Looking for Collaborators Twitter: aneesha Email: aneesha.bakharia@gmail.com Twitter Graphics from Webdesigner Depot http:// www.webdesignerdepot.com Graphics converted to wmf format by Elizabeth Hall