Content analysis 2.0


Published on

Content Analysis 2.0: A Framework for Using Wordle. Presented by me at 'Exploring the language of the popular in Anglo-American Newspapers 1833-1988', an AHRC funded research seminar, at Sheffield University, 14.01.11.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Content analysis 2.0

  1. 1. Content Analysis 2.0: A Framework for Using Wordle Murray Dick
  2. 2. What is Wordle… <ul><li>‘Wordle is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes. The images you create with Wordle are yours to use however you like. You can print them out, or save them to the Wordle gallery to share with your friends.’ </li></ul>
  3. 3. Applications… <ul><li>The literature… </li></ul><ul><li>McNaught, Carmel and Lam, Paul (2010) 'Using Wordle as a Supplementary Research Tool ', The Qualitative Report Volume 15 Number 3 May 2010 630-643 </li></ul><ul><li>Monroe, Burt, Colaresi, Michael, Quinn, Kevin, (2008) 'Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict ', Political Analysis 16 (4): 372-403. </li></ul><ul><li>In online media… </li></ul><ul><li>Rogers, Simon I (2010) ‘ The text of the Queen's speech as a wordle - and how it compares to 1997 ', The Guardian , May 25 th . </li></ul><ul><li>Rogers, Simon II (2010) 'David Cameron and Nick Clegg's statements as a wordle ', The Guardian , May 12 th . </li></ul><ul><li>Rogers, Simon III (2010) 'Conservative manifesto: how does it compare to Labour's? ', The Guardian , April 13 th . </li></ul>
  4. 4. Wordle for content analysis <ul><li>Audit texts – in volume </li></ul><ul><li>Works at atomic level (limitations)‏ </li></ul><ul><li>Lets computers do what they do best </li></ul><ul><li>Lets researchers test hypotheses prior to classification/coding </li></ul><ul><li>Taking ‘manifest content’ (Berelson, 1952, p18) to logical conclusion </li></ul>
  5. 5. A research problem… <ul><li>How does the New York Times’ coverage of Wikileaks material today compare with its coverage of the Pentagon Papers from 39 years ago? </li></ul>
  6. 6. Searching… Save your results in Nexis…
  7. 7. Sorting/filtering texts for Wordle <ul><li>Remove : </li></ul><ul><li>All Index metadata: </li></ul><ul><ul><li>Sources </li></ul></ul><ul><ul><li>Timestamps </li></ul></ul><ul><ul><li>Fields (CORRECTION:, CORRECTION DATE:, HEADLINE:, TITLE:, ABSTRACT: are all mandatory. BODY: is optional via 'custom‘ selector)‏ </li></ul></ul><ul><li>Numbers (removed by Wordle) </li></ul><ul><li>Characters (not included in Wordle)‏ </li></ul><ul><li>Stop words (318 - some removed by Wordle)‏ </li></ul><ul><li>More format-specific/text-specific terms ( caption , Wikileaks )‏ </li></ul>
  8. 8. How to… <ul><li>Save to .txt file </li></ul><ul><li>Paste into Excel (Data sort, prune)‏ </li></ul><ul><li>Paste back into Notebook .txt file, then Find/replace (or advanced Find/replace in Word)‏ </li></ul><ul><li>Paste incrementally into Wordle </li></ul><ul><li>Or write a macro for Word… </li></ul>
  9. 9. Wordle Settings <ul><li>Language </li></ul><ul><ul><li>Remove numbers </li></ul></ul><ul><ul><li>Make all words lower case (WikiLeaks, Wikileaks) - caveates </li></ul></ul><ul><ul><li>Remove common English words (not defined - and watch out for problems ie US)‏ </li></ul></ul><ul><li>Font </li></ul><ul><ul><li>Kenyan Coffee - (space - considerations)‏ </li></ul></ul><ul><li>Layout </li></ul><ul><ul><li>Maximum words to layout: 50 </li></ul></ul><ul><ul><li>Horizontal </li></ul></ul><ul><li>Colour </li></ul><ul><ul><li>W&B (avoid colour associations)‏ </li></ul></ul>
  10. 10. Output All NYT content in Nexis – 279 documents, <250k words
  11. 11. Interpretation and analysis Newspaper style
  12. 12. Interpretation and analysis Angle /coverage
  13. 13. Interpretation and analysis Ethics – see new york times wikileaks
  14. 14. Compare and contrast Guardian: All content in Nexis, 879 documents, <550k words
  15. 15. Compare with Pentagon Papers <ul><li>246 articles in NYT archive (July-Dec/71) for pentagon papers </li></ul><ul><li>2,380 articles in Google News Archive (July-Dec/71) for pentagon papers </li></ul><ul><li>ALL ARE IN PDF/TIFF/IMAGE FORMAT </li></ul>
  16. 16. Conclusion <ul><li>Client-side OCR is time consuming </li></ul><ul><li>Many online archives use OCR internally, but don’t output text </li></ul><ul><li>Researchers need text in data-portable formats </li></ul><ul><li>Professional research is being squeezed by consumer demand </li></ul>