Crawling and ScrapingThe Issuecrawler and the Lippmannian device.   Erik Borra   Michael Stevenson
“Reworking method for Internet research”
Issuecrawler.
CRAWL STARTING POINTS                        Site                           A                           B                 ...
CRAWL STARTING POINTS      DEPTH ONEfollow all starting points outlinks                                       Site        ...
CRAWL STARTING POINTS      DEPTH ONE            TWOfollow all starting points outlinks found in the previous depth        ...
ANALYSIS SNOWBALLretain all links and sites discovered during the crawl                                                   ...
ANALYSIS INTER-ACTORretain only links between the starting points                                                Site     ...
ANALYSIS CO-LINKretain sites that receive links from at least two other sites                                             ...
Issuecrawler.   Modes of analysis
Issuecrawler.                        Micro-politics of associationPharmaceutical multinational and environmental NGO link ...
Issuecrawler.                             Micro-politics of associationClusters of Armenian and international organization...
Issuecrawler.          Macro-politics of associationDemocratic Presidential Primary Web Campaigns (Betsy Sinclair 2007; 20...
Issuecrawler.Macro-politics of association
Issuecrawler.Macro-politics of association
Issuecrawler.Network composition over time
Issuecrawler.           Micro-politics of association           Macro-politics of association         Network composition ...
Lippmannian device.         Modes of analysis
Walter Lippmann (1889-1974).                                         “A Test of the News,” 1920                           ...
Lippmannian device.                  Showing the partisanship of an actor.           Showing the issue agenda of an organi...
Lippmannian device.                                 “Source cloud”                                   Showing the partisans...
Lippmannian device.              “Source cloud”        Method for showing the partisanship or            commitment of sou...
Lippmannian device.             “Source cloud”                  Showing the partisanship or              commitment of sou...
Lippmannian device.  “Making an Issue cloud”               An organization’s issue agenda                             (or ...
Lippmannian device.                                “Issue cloud”                                    Showing the issue comm...
Lippmannian device.   “Making an Issue cloud”Greenpeace issues, http://www.greenpeace.org/international/campaigns.Stop cli...
Lippmannian device.                               “Issue cloud”                Greenpeace’s issue agenda (distribution of ...
Lippmannian device.“Making an Issue cloud”           Multiple sources, multiple issues                   What is the agend...
Lippmannian device.“Making an Issue cloud”        Multiple sources, multiple issues   This is more complicated, but still ...
Lippmannian device.   “Making an Issue cloud”          Take three good lists of human rightsorganizations (global south, g...
Lippmannian device.  “Making an Issue cloud”Make a list of all issues listed on all Websites
Lippmannian device.                         “Issue cloud”                             Showing the issue commitments       ...
Lippmannian device.                                 “Issue cloud”                                     Showing the issue co...
Lippmannian device.Partisanship check. Which side of the           controversy is an actor on?                  Use the so...
Lippmannian device.      1. Check an organization’s issue agenda.            What are its current commitments?2. Check a n...
Questions.
Exercise:Sourcing Climate Change               Skeptics.
Climate Change Sceptics on the Web (Frederick Seitz)  Research Question_To what extent are climate change skeptics present...
Research Question:Which climate change issue actors mention theskeptics, and what kinds of actors are morelikely to mentio...
Source Sets:(1) Top ten Google returns for “climatechange” (mix of media as well as governmentalorganizations)
Source Sets:(2) Climate change blogs network (IssueCrawlerresults - mix of blogs, social media, traditionalmedia and gover...
Source Sets:(3) Climate change science network(IssueCrawler results - governmental, non-governmental, educational and medi...
Dmi12   workshops - crawling and scraping
Dmi12   workshops - crawling and scraping
Dmi12   workshops - crawling and scraping
Dmi12   workshops - crawling and scraping
Dmi12   workshops - crawling and scraping
Dmi12   workshops - crawling and scraping
Dmi12   workshops - crawling and scraping
Upcoming SlideShare
Loading in …5
×

Dmi12 workshops - crawling and scraping

923 views

Published on

The workshop serves as an introduction to two classic digital methods techniques for issue mapping and analysis. A discussion of the Issue Crawler and the Lippmannian device is followed by a short exercise in which we'll study the presence of skeptics among top sources of information related to climate change.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
923
On SlideShare
0
From Embeds
0
Number of Embeds
265
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Dmi12 workshops - crawling and scraping

  1. 1. Crawling and ScrapingThe Issuecrawler and the Lippmannian device. Erik Borra Michael Stevenson
  2. 2. “Reworking method for Internet research”
  3. 3. Issuecrawler.
  4. 4. CRAWL STARTING POINTS Site A B C Body Text Body text
  5. 5. CRAWL STARTING POINTS DEPTH ONEfollow all starting points outlinks Site A B C Body Text D Body text
  6. 6. CRAWL STARTING POINTS DEPTH ONE TWOfollow all starting points outlinks found in the previous depth outlinks from the pages Site A B C Body Text D E F G H Body text
  7. 7. ANALYSIS SNOWBALLretain all links and sites discovered during the crawl Site A B C Body Text D E F G H Body text
  8. 8. ANALYSIS INTER-ACTORretain only links between the starting points Site A B C Body Text Body text
  9. 9. ANALYSIS CO-LINKretain sites that receive links from at least two other sites Site B D Body Text Body text
  10. 10. Issuecrawler. Modes of analysis
  11. 11. Issuecrawler. Micro-politics of associationPharmaceutical multinational and environmental NGO link to(inter)governmental organizations, but these do not link back.Pharmaceutical multinational links to environmental NGO, butNGO does not link back. (Govcom.org, 1999)
  12. 12. Issuecrawler. Micro-politics of associationClusters of Armenian and international organizations, latter do not linkback. (Audrey Selian, 2004)
  13. 13. Issuecrawler. Macro-politics of associationDemocratic Presidential Primary Web Campaigns (Betsy Sinclair 2007; 2008)
  14. 14. Issuecrawler.Macro-politics of association
  15. 15. Issuecrawler.Macro-politics of association
  16. 16. Issuecrawler.Network composition over time
  17. 17. Issuecrawler. Micro-politics of association Macro-politics of association Network composition over timeHowever... “Doesn’t do content analysis”
  18. 18. Lippmannian device. Modes of analysis
  19. 19. Walter Lippmann (1889-1974). “A Test of the News,” 1920 Public Opinion, 1922 The Phantom Public, 1927‘The problem is to locate by clear and coarse objective tests the actor in acontroversy who is most worthy of public support.’ (p120) -The Phantom Public
  20. 20. Lippmannian device. Showing the partisanship of an actor. Showing the issue agenda of an organization.Partisanship or commitment. Which Issue agenda. Which issues are on thesources mention the expert’s name? agenda of an organization or movement? Source cloud Issue cloud
  21. 21. Lippmannian device. “Source cloud” Showing the partisanship or commitment of sources to one nameCraig Venters presence in the Synthetic Biology issue space, March 2008. Top sources on "syntheticbiology" according to a Google query, with number of mentions of Venter per source, ordered.
  22. 22. Lippmannian device. “Source cloud” Method for showing the partisanship or commitment of sources to names1. Gather source list (e.g. through IssueCrawler) 2. Query source list for one or more experts
  23. 23. Lippmannian device. “Source cloud” Showing the partisanship or commitment of sources to names Climate Change Skeptics: Who recognizes them? (Digital Methods Initiative, 2007)https://wiki.digitalmethods.net/Dmi/ClimateChangeSkeptics
  24. 24. Lippmannian device. “Making an Issue cloud” An organization’s issue agenda (or commitment) Public Knowledge, a digital rights NGO,has issues. Which are they most committed to?
  25. 25. Lippmannian device. “Issue cloud” Showing the issue commitments of the NGO, Public KnowledgePublic Knowledges issue commitment. Lower six issues on Public Knowledges issue list, rankedaccording to number of mentions of issues on publicknowledge.org, 2 October 2009.
  26. 26. Lippmannian device. “Making an Issue cloud”Greenpeace issues, http://www.greenpeace.org/international/campaigns.Stop climate changeProtect ancient forestsDefending our OceansSay no to genetic engineeringEliminate toxic chemicalsDemand Peace and DisarmamentEnd the nuclear ageEncourage sustainable tradeKeep most significant issue language."climate change""ancient forests"oceans"genetic engineering""toxic chemicals"disarmament"nuclear power""sustainable trade"
  27. 27. Lippmannian device. “Issue cloud” Greenpeace’s issue agenda (distribution of commitment)Greenpeaces issue commitment. Greenpeaces campaign issue list, ranked according to number ofmentions of issues on greenpeace.org, 11 October 2009.
  28. 28. Lippmannian device.“Making an Issue cloud” Multiple sources, multiple issues What is the agenda of the global human rights network? Which issues are at the top and at the bottom of the agenda?What is the current level of commitment to a particular issue?
  29. 29. Lippmannian device.“Making an Issue cloud” Multiple sources, multiple issues This is more complicated, but still doable (Govcom.org, University of Pittsburg, UMass Amhearst, ongoing)
  30. 30. Lippmannian device. “Making an Issue cloud” Take three good lists of human rightsorganizations (global south, global north, UN’s)
  31. 31. Lippmannian device. “Making an Issue cloud”Make a list of all issues listed on all Websites
  32. 32. Lippmannian device. “Issue cloud” Showing the issue commitments of global human rights networkGlobal human rights issue agenda. Global human rights actors issues, ranked according to theestimated number of Google mentions on a set of global human rights actors websites, 31 March 2009.
  33. 33. Lippmannian device. “Issue cloud” Showing the issue commitments of global human rights networkGlobal human rights issue agenda, bottom. Global human rights actors issues, ranked according to theestimated number of Google mentions on a set of global human rights actors websites, 31 March 2009.
  34. 34. Lippmannian device.Partisanship check. Which side of the controversy is an actor on? Use the source cloud
  35. 35. Lippmannian device. 1. Check an organization’s issue agenda. What are its current commitments?2. Check a national or global movement’s issue agenda. What are its current commitments? Use the issue cloud
  36. 36. Questions.
  37. 37. Exercise:Sourcing Climate Change Skeptics.
  38. 38. Climate Change Sceptics on the Web (Frederick Seitz) Research Question_To what extent are climate change skeptics present in the climate change spaces on the Web? Findings_There is distance between the skeptics and the top of the search engine returns. epa.gov (0) bbc.co.uk (0) defra.gov.uk (0) unep.org (0) bom.gov.au (0) ipcc.ch (0) pewclimate.org (0) davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0) greenpeace.org (0) climatechallenge.gov.uk (0) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (1) foe.co.uk (0) state.gov (0) scidev.net (0) eea.europa.eu (0) whoi.edu (0) cbc.ca (0) energy.gov (0) Body Text marshall.org (8) climateark.org (4) un.org (0) dar.csiro.au (0) theglobeandmail.com (0) acfonline.org.au (0) gcrio.org (0) nature.com (0) grida.no (0) nature.org (0) ecokids.ca (0) royalsoc.ac.uk (0) climatechangecentral.com (0) iea.org (0) ecn.ac.uk (0) ecy.wa.gov (0) worldwildlife.org (0) realclimate.org (35) metoffice.gov.uk (0) open2.net (0) scienceagogo.com (0) eldis.org (0) ft.com (0) who.int (0) climatecrisis.net (0) faqs.org (0) ltscotland.org.uk (0) abc.net.au (0) climatechange.ca.gov (0) envirolink.org (0) mofa.go.jp (0) sourcewatch.org (21) Body text iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (0) climatescience.gov (0) climatechangecollege.org (0) ciel.org (0) ucar.edu (0)Source_google.com Product_of the Digital Methods Initiative,Query_“Frederick Seitz” dmi.mediastudies.nl. Analysis_by BramMethod_Search for query “Frederick Seitz” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van derTools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond.Date_30 July 2007 CLIMATE CHANGE SCEPTICS CC_BY:NC:SA
  39. 39. Research Question:Which climate change issue actors mention theskeptics, and what kinds of actors are morelikely to mention them?Method:Comparative Query: skeptics in three source sets(‘top’ sources, climate change blogs and climatechange science network), outputting sourcecloud for each.
  40. 40. Source Sets:(1) Top ten Google returns for “climatechange” (mix of media as well as governmentalorganizations)
  41. 41. Source Sets:(2) Climate change blogs network (IssueCrawlerresults - mix of blogs, social media, traditionalmedia and governmental and non-governmentalorganizations)
  42. 42. Source Sets:(3) Climate change science network(IssueCrawler results - governmental, non-governmental, educational and mediaorganizations)

×