Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Media Cloud


Published on

Talk at IBM's Transparent Text Forum about the Berkman research project MediaCloud

Published in: News & Politics, Business
  • Be the first to comment

  • Be the first to like this

Media Cloud

  1. 1. #pman 32,107 posts 1979 authors high RT #s influencers
  2. 2. flickr photo by mhartford, cc
  3. 3. Nutritional Information: New York Times Ingredients: International coverage 42% (includes 8% Iraq, 5% Afghanistan, minimum weekly 5% China, and no less than 2% Africa), Washington coverage 28% (includes 7% Obama, 6% Congress, and trace amounts of Limbaugh), New York State/ Albany coverage 14%, New York City coverage 10%, and less than 6% domestic US coverage. Warning: contains less than 40% of sports coverage of the leading competitor, the New York Post, and 50% of business coverage of the Wall Street Journal. May contain less than your recommended daily allowance of Latin America News.
  4. 4. You have not read any international news today.
  5. 5. magazines radio (npr, talk) blogs (int’l, domestic, left, right) ? newspapers twitter television (network, cable) facebook email forwards websites
  6. 6. hand-coding link analysis automated content analysis
  7. 7. PEJ News coverage index
  8. 8.
  9. 9. Hand-coding: traditional media monitoring Upsides: - High accuracy - Flexibility - Low startup cost Downsides: - Small data sets, problems extrapolating - Time consuming, no real-time data - Intercoder reliability, difficulty of coder setup
  10. 10. Adamic and Glance
  11. 11. Link analysis: Leveraging web architectures Upsides: - Highly automatable - Large data sets - Leverage existing tools for network research Downsides: - Only consider link structure, not content - Danger of conflating linking with social structure - Need for hand-coding to make sense of clusters - Good for blogs, bad for MSM
  12. 12.
  13. 13. Content analysis: Just becoming possible Upsides: - Can work with unstructured text, blogs and MSM - Large data sets, highly automatable - Easy linkage with visualization platforms Downsides: - Inaccuracy - Language constraints - Major programming investments
  14. 14. What We Have Done 1 3 get create news term stories list 2 4 extract allow story rich text queries
  15. 15. Terming Lexicon-based simple matching More complex term extraction Archer Daniels Midland Company Archer, Bill Archer, Dennis W. Archer, Jeffrey Archery Archibold, Randal C. Architecture Architecure and Design Archives and Records Archon Corp. Archstone-Smith Trust ArcSight Inc. Arctic Cat Inc. Arctic Monkeys
  16. 16. - 9.25 million stories - 900G of database + downloaded content - 162 million story / tag associations - 1,500 sources - 10,000 feeds - roughly 20,000 stories per day
  17. 17. Topic focus
  18. 18. Pivoting on “republican”
  19. 19. Global Attention and Power Laws
  20. 20. You say “stimulus” and I say “bailout”
  21. 21. what’s been hard: topic clustering replicating across languages legal concerns dark matter