SlideShare a Scribd company logo
1 of 49
Download to read offline
Crawling and Scraping
The Issuecrawler and the Lippmannian device.




   Erik Borra
   Michael Stevenson
“Reworking method for Internet research”
Issuecrawler.
CRAWL STARTING POINTS




                        Site

                           A
                           B
                           C
  Body Text




   Body text
CRAWL STARTING POINTS
      DEPTH ONE
follow all starting points' outlinks




                                       Site

                                          A
                                          B
                                          C
    Body Text                             D




    Body text
CRAWL STARTING POINTS
      DEPTH ONE
            TWO
follow all starting points' outlinks found in the previous depth
           outlinks from the pages




                                                                   Site

                                                                      A
                                                                      B
                                                                      C
    Body Text                                                         D
                                                                      E
                                                                      F
                                                                      G
                                                                      H




    Body text
ANALYSIS SNOWBALL
retain all links and sites discovered during the crawl




                                                         Site

                                                            A
                                                            B
                                                            C
    Body Text                                               D
                                                            E
                                                            F
                                                            G
                                                            H




    Body text
ANALYSIS INTER-ACTOR
retain only links between the starting points




                                                Site

                                                   A
                                                   B
                                                   C
    Body Text




    Body text
ANALYSIS CO-LINK
retain sites that receive links from at least two other sites




                                                                Site

                                                                   B
                                                                   D


    Body Text




    Body text
Issuecrawler.
   Modes of analysis
Issuecrawler.
                        Micro-politics of association




Pharmaceutical multinational and environmental NGO link to
(inter)governmental organizations, but these do not link back.

Pharmaceutical multinational links to environmental NGO, but
NGO does not link back.

                                            (Govcom.org, 1999)
Issuecrawler.
                             Micro-politics of association




Clusters of Armenian and international organizations, latter do not link
back.

                                                    (Audrey Selian, 2004)
Issuecrawler.
          Macro-politics of association




Democratic Presidential Primary Web Campaigns (Betsy Sinclair 2007; 2008)
Issuecrawler.
Macro-politics of association
Issuecrawler.
Macro-politics of association
Issuecrawler.
Network composition over time
Issuecrawler.
           Micro-politics of association
           Macro-politics of association
         Network composition over time

However... “Doesn’t do content analysis”
Lippmannian device.
         Modes of analysis
Walter Lippmann (1889-1974).
                                         “A Test of the News,” 1920
                                                Public Opinion, 1922
                                          The Phantom Public, 1927

‘The problem is to locate by clear and coarse objective tests the actor in a
controversy who is most worthy of public support.’ (p120)

                                                   -The Phantom Public
Lippmannian device.
                  Showing the partisanship of an actor.
           Showing the issue agenda of an organization.

Partisanship or commitment. Which      Issue agenda. Which issues are on the
sources mention the expert’s name?   agenda of an organization or movement?




         Source cloud                                  Issue cloud
Lippmannian device.
                                 “Source cloud”
                                   Showing the partisanship or
                            commitment of sources to one name




Craig Venter's presence in the Synthetic Biology issue space, March 2008. Top sources on "synthetic
biology" according to a Google query, with number of mentions of Venter per source, ordered.
Lippmannian device.
              “Source cloud”
        Method for showing the partisanship or
            commitment of sources to names

1. Gather source list (e.g. through IssueCrawler)
    2. Query source list for one or more experts
Lippmannian device.
             “Source cloud”
                  Showing the partisanship or
              commitment of sources to names


                         Climate Change Skeptics:
                            Who recognizes them?


                            (Digital Methods Initiative, 2007)
https://wiki.digitalmethods.net/Dmi/ClimateChangeSkeptics
Lippmannian device.
  “Making an Issue cloud”
               An organization’s issue agenda
                             (or commitment)

        Public Knowledge, a digital rights NGO,
has issues. Which are they most committed to?
Dmi12   workshops - crawling and scraping
Dmi12   workshops - crawling and scraping
Dmi12   workshops - crawling and scraping
Lippmannian device.
                                “Issue cloud”
                                    Showing the issue commitments
                                     of the NGO, Public Knowledge




Public Knowledge's issue commitment. Lower six issues on Public Knowledge's issue list, ranked
according to number of mentions of issues on publicknowledge.org, 2 October 2009.
Dmi12   workshops - crawling and scraping
Lippmannian device.
   “Making an Issue cloud”
Greenpeace issues, http://www.greenpeace.org/international/campaigns.

Stop climate change
Protect ancient forests
Defending our Oceans
Say no to genetic engineering
Eliminate toxic chemicals
Demand Peace and Disarmament
End the nuclear age
Encourage sustainable trade

Keep most significant issue language.

"climate change"
"ancient forests"
oceans
"genetic engineering"
"toxic chemicals"
disarmament
"nuclear power"
"sustainable trade"
Dmi12   workshops - crawling and scraping
Lippmannian device.
                               “Issue cloud”
                Greenpeace’s issue agenda (distribution of
                                            commitment)




Greenpeace's issue commitment. Greenpeace's campaign issue list, ranked according to number of
mentions of issues on greenpeace.org, 11 October 2009.
Lippmannian device.
“Making an Issue cloud”
           Multiple sources, multiple issues

                   What is the agenda of the
               global human rights network?

              Which issues are at the top and
               at the bottom of the agenda?

What is the current level of commitment to a
                              particular issue?
Lippmannian device.
“Making an Issue cloud”
        Multiple sources, multiple issues




   This is more complicated, but still doable
         (Govcom.org, University of Pittsburg, UMass Amhearst, ongoing)
Lippmannian device.
   “Making an Issue cloud”
          Take three good lists of human rights
organizations (global south, global north, UN’s)
Lippmannian device.
  “Making an Issue cloud”
Make a list of all issues listed on all Websites
Dmi12   workshops - crawling and scraping
Dmi12   workshops - crawling and scraping
Lippmannian device.
                         “Issue cloud”
                             Showing the issue commitments
                             of global human rights network




Global human rights issue agenda. Global human rights actors' issues, ranked according to the
estimated number of Google mentions on a set of global human rights actors' websites, 31 March 2009.
Lippmannian device.
                                 “Issue cloud”
                                     Showing the issue commitments
                                     of global human rights network




Global human rights issue agenda, bottom. Global human rights actors' issues, ranked according to the
estimated number of Google mentions on a set of global human rights actors' websites, 31 March 2009.
Lippmannian device.

Partisanship check. Which side of the
           controversy is an actor on?

                  Use the source cloud
Lippmannian device.

      1. Check an organization’s issue agenda.
            What are its current commitments?
2. Check a national or global movement’s issue
    agenda. What are its current commitments?

                            Use the issue cloud
Questions.
Exercise:
Sourcing Climate Change
               Skeptics.
Climate Change Sceptics on the Web (Frederick Seitz)
  Research Question_To what extent are climate change 'skeptics' present
  in the climate change spaces on the Web?
  Findings_There is distance between the skeptics and the top of the
  search engine returns.




                             epa.gov (0)     bbc.co.uk (0)         defra.gov.uk (0)      unep.org (0)        bom.gov.au (0)            ipcc.ch (0)         pewclimate.org (0)
                             davidsuzuki.org (0)       panda.org (0)     mfe.govt.nz (0)    ec.gc.ca (0)        exploratorium.edu (0)    climatechange.com.au (0)
                             greenpeace.org (0)       climatechallenge.gov.uk (0)     guardian.co.uk (0)         iisd.org (0) g8.gov.uk (0) campaigncc.org (1)
                             foe.co.uk (0)    state.gov (0)        scidev.net (0)       eea.europa.eu (0)              whoi.edu (0)           cbc.ca (0)       energy.gov (0)
              Body Text
                     marshall.org (8)                             climateark.org (4)              un.org (0)            dar.csiro.au (0)         theglobeandmail.com (0)
                             acfonline.org.au (0)       gcrio.org (0)   nature.com (0)      grida.no (0)       nature.org (0)         ecokids.ca (0)       royalsoc.ac.uk (0)
                             climatechangecentral.com (0)                 iea.org (0)           ecn.ac.uk (0)                ecy.wa.gov (0)            worldwildlife.org (0)


                            realclimate.org (35)
                             metoffice.gov.uk (0)      open2.net (0)    scienceagogo.com (0)       eldis.org (0)  ft.com (0) who.int (0) climatecrisis.net (0)
                                                                                                                                                                  faqs.org (0)




                             ltscotland.org.uk (0)             abc.net.au (0)            climatechange.ca.gov (0)         envirolink.org (0)   mofa.go.jp (0)


                    sourcewatch.org (21)
              Body text
                                                                                                              iucn.org (0)         dfat.gov.au (0)         ncdc.noaa.gov (0)

                             climatescience.gov (0)            climatechangecollege.org (0)             ciel.org (0)        ucar.edu (0)




Source_google.com                                                                                                               Product_of the Digital Methods Initiative,
Query_“Frederick Seitz”                                                                                                         dmi.mediastudies.nl. Analysis_by Bram
Method_Search for query “Frederick Seitz” in top 100. Organized in order.                                                       Nijhof, Richard Rogers and Laura van der
Tools_Google Scraper and Tag Cloud Generator                                                                                    Vlies. Design_Anne Helmond.
Date_30 July 2007                                                                                                                                                                CLIMATE CHANGE
                                                                                                                                                                                    SCEPTICS

                                                                                                                                CC_BY:NC:SA
Research Question:
Which climate change issue actors mention the
skeptics, and what kinds of actors are more
likely to mention them?

Method:
Comparative Query: skeptics in three source sets
(‘top’ sources, climate change blogs and climate
change science network), outputting source
cloud for each.
Source Sets:

(1) Top ten Google returns for “climate
change” (mix of media as well as governmental
organizations)
Source Sets:

(2) Climate change blogs network (IssueCrawler
results - mix of blogs, social media, traditional
media and governmental and non-governmental
organizations)
Source Sets:

(3) Climate change science network
(IssueCrawler results - governmental, non-
governmental, educational and media
organizations)

More Related Content

More from Digital Methods Initiative

Query Design for Digital Methods by Richard Rogers
Query Design for Digital Methods by Richard RogersQuery Design for Digital Methods by Richard Rogers
Query Design for Digital Methods by Richard RogersDigital Methods Initiative
 
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...Digital Methods Initiative
 
Digital Methods Summer School 2015 Tool Medley
Digital Methods Summer School 2015 Tool MedleyDigital Methods Summer School 2015 Tool Medley
Digital Methods Summer School 2015 Tool MedleyDigital Methods Initiative
 
Digital Methods Summer School 2014 Tool Medley
Digital Methods Summer School 2014 Tool MedleyDigital Methods Summer School 2014 Tool Medley
Digital Methods Summer School 2014 Tool MedleyDigital Methods Initiative
 
Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers studyingpoliticalissues mar2014_optimized_ii_Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers studyingpoliticalissues mar2014_optimized_ii_Digital Methods Initiative
 
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Digital Methods Initiative
 
Interactive visualization and exploration of network data with Gephi
Interactive visualization and exploration of network data with GephiInteractive visualization and exploration of network data with Gephi
Interactive visualization and exploration of network data with GephiDigital Methods Initiative
 
National Tracking Ecologies - Digital Methods Summer School 2013
National Tracking Ecologies - Digital Methods Summer School 2013National Tracking Ecologies - Digital Methods Summer School 2013
National Tracking Ecologies - Digital Methods Summer School 2013Digital Methods Initiative
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Digital Methods Initiative
 
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013Digital Methods Initiative
 
Repurposing Wikipedia: Wikipedia as data set and analytical device
Repurposing Wikipedia: Wikipedia as data set and analytical deviceRepurposing Wikipedia: Wikipedia as data set and analytical device
Repurposing Wikipedia: Wikipedia as data set and analytical deviceDigital Methods Initiative
 
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...Digital Methods Initiative
 
Digital Methods Summer School 2013 Tool Medley
Digital Methods Summer School 2013 Tool MedleyDigital Methods Summer School 2013 Tool Medley
Digital Methods Summer School 2013 Tool MedleyDigital Methods Initiative
 
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...Digital Methods Initiative
 
Post-social methods? Issues in live research, by Noortje Marres and Esther We...
Post-social methods? Issues in live research, by Noortje Marres and Esther We...Post-social methods? Issues in live research, by Noortje Marres and Esther We...
Post-social methods? Issues in live research, by Noortje Marres and Esther We...Digital Methods Initiative
 

More from Digital Methods Initiative (20)

Query Design for Digital Methods by Richard Rogers
Query Design for Digital Methods by Richard RogersQuery Design for Digital Methods by Richard Rogers
Query Design for Digital Methods by Richard Rogers
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
Richard Rogers, Otherwise Engaged: Critical Analytics and the New Meanings of...
 
Digital Methods Tool Medley
Digital Methods Tool MedleyDigital Methods Tool Medley
Digital Methods Tool Medley
 
Digital Methods Summer School 2015 Tool Medley
Digital Methods Summer School 2015 Tool MedleyDigital Methods Summer School 2015 Tool Medley
Digital Methods Summer School 2015 Tool Medley
 
Rogers data days_2014_slides_opti
Rogers data days_2014_slides_optiRogers data days_2014_slides_opti
Rogers data days_2014_slides_opti
 
Digital Methods Summer School 2014 Tool Medley
Digital Methods Summer School 2014 Tool MedleyDigital Methods Summer School 2014 Tool Medley
Digital Methods Summer School 2014 Tool Medley
 
Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers studyingpoliticalissues mar2014_optimized_ii_Rogers studyingpoliticalissues mar2014_optimized_ii_
Rogers studyingpoliticalissues mar2014_optimized_ii_
 
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_Rogers digitalmethodsaftersocialmedia nov2013_optimized_
Rogers digitalmethodsaftersocialmedia nov2013_optimized_
 
The Birth of Social Media Methods
The Birth of Social Media MethodsThe Birth of Social Media Methods
The Birth of Social Media Methods
 
Interactive visualization and exploration of network data with Gephi
Interactive visualization and exploration of network data with GephiInteractive visualization and exploration of network data with Gephi
Interactive visualization and exploration of network data with Gephi
 
National Tracking Ecologies - Digital Methods Summer School 2013
National Tracking Ecologies - Digital Methods Summer School 2013National Tracking Ecologies - Digital Methods Summer School 2013
National Tracking Ecologies - Digital Methods Summer School 2013
 
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
Cross-Platform Profiling tutorial at the Digital Methods Summer School 2013
 
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013Tracking the Trackers tutorial at the Digital Methods Summer School 2013
Tracking the Trackers tutorial at the Digital Methods Summer School 2013
 
Repurposing Wikipedia: Wikipedia as data set and analytical device
Repurposing Wikipedia: Wikipedia as data set and analytical deviceRepurposing Wikipedia: Wikipedia as data set and analytical device
Repurposing Wikipedia: Wikipedia as data set and analytical device
 
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
Studying Facebook via Data Extraction: a Netvizz tutorial at the Digital Meth...
 
Digital Methods Summer School 2013 Tool Medley
Digital Methods Summer School 2013 Tool MedleyDigital Methods Summer School 2013 Tool Medley
Digital Methods Summer School 2013 Tool Medley
 
Hashtag lifelines
Hashtag lifelinesHashtag lifelines
Hashtag lifelines
 
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
Traces of the Trackers. Tracking the Trackers: A historical analysis using th...
 
Post-social methods? Issues in live research, by Noortje Marres and Esther We...
Post-social methods? Issues in live research, by Noortje Marres and Esther We...Post-social methods? Issues in live research, by Noortje Marres and Esther We...
Post-social methods? Issues in live research, by Noortje Marres and Esther We...
 

Recently uploaded

Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2DianaGray10
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfInfopole1
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 

Recently uploaded (20)

Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2UiPath Studio Web workshop series - Day 2
UiPath Studio Web workshop series - Day 2
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Extra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdfExtra-120324-Visite-Entreprise-icare.pdf
Extra-120324-Visite-Entreprise-icare.pdf
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 

Dmi12 workshops - crawling and scraping

  • 1. Crawling and Scraping The Issuecrawler and the Lippmannian device. Erik Borra Michael Stevenson
  • 2. “Reworking method for Internet research”
  • 4. CRAWL STARTING POINTS Site A B C Body Text Body text
  • 5. CRAWL STARTING POINTS DEPTH ONE follow all starting points' outlinks Site A B C Body Text D Body text
  • 6. CRAWL STARTING POINTS DEPTH ONE TWO follow all starting points' outlinks found in the previous depth outlinks from the pages Site A B C Body Text D E F G H Body text
  • 7. ANALYSIS SNOWBALL retain all links and sites discovered during the crawl Site A B C Body Text D E F G H Body text
  • 8. ANALYSIS INTER-ACTOR retain only links between the starting points Site A B C Body Text Body text
  • 9. ANALYSIS CO-LINK retain sites that receive links from at least two other sites Site B D Body Text Body text
  • 10. Issuecrawler. Modes of analysis
  • 11. Issuecrawler. Micro-politics of association Pharmaceutical multinational and environmental NGO link to (inter)governmental organizations, but these do not link back. Pharmaceutical multinational links to environmental NGO, but NGO does not link back. (Govcom.org, 1999)
  • 12. Issuecrawler. Micro-politics of association Clusters of Armenian and international organizations, latter do not link back. (Audrey Selian, 2004)
  • 13. Issuecrawler. Macro-politics of association Democratic Presidential Primary Web Campaigns (Betsy Sinclair 2007; 2008)
  • 17. Issuecrawler. Micro-politics of association Macro-politics of association Network composition over time However... “Doesn’t do content analysis”
  • 18. Lippmannian device. Modes of analysis
  • 19. Walter Lippmann (1889-1974). “A Test of the News,” 1920 Public Opinion, 1922 The Phantom Public, 1927 ‘The problem is to locate by clear and coarse objective tests the actor in a controversy who is most worthy of public support.’ (p120) -The Phantom Public
  • 20. Lippmannian device. Showing the partisanship of an actor. Showing the issue agenda of an organization. Partisanship or commitment. Which Issue agenda. Which issues are on the sources mention the expert’s name? agenda of an organization or movement? Source cloud Issue cloud
  • 21. Lippmannian device. “Source cloud” Showing the partisanship or commitment of sources to one name Craig Venter's presence in the Synthetic Biology issue space, March 2008. Top sources on "synthetic biology" according to a Google query, with number of mentions of Venter per source, ordered.
  • 22. Lippmannian device. “Source cloud” Method for showing the partisanship or commitment of sources to names 1. Gather source list (e.g. through IssueCrawler) 2. Query source list for one or more experts
  • 23. Lippmannian device. “Source cloud” Showing the partisanship or commitment of sources to names Climate Change Skeptics: Who recognizes them? (Digital Methods Initiative, 2007) https://wiki.digitalmethods.net/Dmi/ClimateChangeSkeptics
  • 24. Lippmannian device. “Making an Issue cloud” An organization’s issue agenda (or commitment) Public Knowledge, a digital rights NGO, has issues. Which are they most committed to?
  • 28. Lippmannian device. “Issue cloud” Showing the issue commitments of the NGO, Public Knowledge Public Knowledge's issue commitment. Lower six issues on Public Knowledge's issue list, ranked according to number of mentions of issues on publicknowledge.org, 2 October 2009.
  • 30. Lippmannian device. “Making an Issue cloud” Greenpeace issues, http://www.greenpeace.org/international/campaigns. Stop climate change Protect ancient forests Defending our Oceans Say no to genetic engineering Eliminate toxic chemicals Demand Peace and Disarmament End the nuclear age Encourage sustainable trade Keep most significant issue language. "climate change" "ancient forests" oceans "genetic engineering" "toxic chemicals" disarmament "nuclear power" "sustainable trade"
  • 32. Lippmannian device. “Issue cloud” Greenpeace’s issue agenda (distribution of commitment) Greenpeace's issue commitment. Greenpeace's campaign issue list, ranked according to number of mentions of issues on greenpeace.org, 11 October 2009.
  • 33. Lippmannian device. “Making an Issue cloud” Multiple sources, multiple issues What is the agenda of the global human rights network? Which issues are at the top and at the bottom of the agenda? What is the current level of commitment to a particular issue?
  • 34. Lippmannian device. “Making an Issue cloud” Multiple sources, multiple issues This is more complicated, but still doable (Govcom.org, University of Pittsburg, UMass Amhearst, ongoing)
  • 35. Lippmannian device. “Making an Issue cloud” Take three good lists of human rights organizations (global south, global north, UN’s)
  • 36. Lippmannian device. “Making an Issue cloud” Make a list of all issues listed on all Websites
  • 39. Lippmannian device. “Issue cloud” Showing the issue commitments of global human rights network Global human rights issue agenda. Global human rights actors' issues, ranked according to the estimated number of Google mentions on a set of global human rights actors' websites, 31 March 2009.
  • 40. Lippmannian device. “Issue cloud” Showing the issue commitments of global human rights network Global human rights issue agenda, bottom. Global human rights actors' issues, ranked according to the estimated number of Google mentions on a set of global human rights actors' websites, 31 March 2009.
  • 41. Lippmannian device. Partisanship check. Which side of the controversy is an actor on? Use the source cloud
  • 42. Lippmannian device. 1. Check an organization’s issue agenda. What are its current commitments? 2. Check a national or global movement’s issue agenda. What are its current commitments? Use the issue cloud
  • 45. Climate Change Sceptics on the Web (Frederick Seitz) Research Question_To what extent are climate change 'skeptics' present in the climate change spaces on the Web? Findings_There is distance between the skeptics and the top of the search engine returns. epa.gov (0) bbc.co.uk (0) defra.gov.uk (0) unep.org (0) bom.gov.au (0) ipcc.ch (0) pewclimate.org (0) davidsuzuki.org (0) panda.org (0) mfe.govt.nz (0) ec.gc.ca (0) exploratorium.edu (0) climatechange.com.au (0) greenpeace.org (0) climatechallenge.gov.uk (0) guardian.co.uk (0) iisd.org (0) g8.gov.uk (0) campaigncc.org (1) foe.co.uk (0) state.gov (0) scidev.net (0) eea.europa.eu (0) whoi.edu (0) cbc.ca (0) energy.gov (0) Body Text marshall.org (8) climateark.org (4) un.org (0) dar.csiro.au (0) theglobeandmail.com (0) acfonline.org.au (0) gcrio.org (0) nature.com (0) grida.no (0) nature.org (0) ecokids.ca (0) royalsoc.ac.uk (0) climatechangecentral.com (0) iea.org (0) ecn.ac.uk (0) ecy.wa.gov (0) worldwildlife.org (0) realclimate.org (35) metoffice.gov.uk (0) open2.net (0) scienceagogo.com (0) eldis.org (0) ft.com (0) who.int (0) climatecrisis.net (0) faqs.org (0) ltscotland.org.uk (0) abc.net.au (0) climatechange.ca.gov (0) envirolink.org (0) mofa.go.jp (0) sourcewatch.org (21) Body text iucn.org (0) dfat.gov.au (0) ncdc.noaa.gov (0) climatescience.gov (0) climatechangecollege.org (0) ciel.org (0) ucar.edu (0) Source_google.com Product_of the Digital Methods Initiative, Query_“Frederick Seitz” dmi.mediastudies.nl. Analysis_by Bram Method_Search for query “Frederick Seitz” in top 100. Organized in order. Nijhof, Richard Rogers and Laura van der Tools_Google Scraper and Tag Cloud Generator Vlies. Design_Anne Helmond. Date_30 July 2007 CLIMATE CHANGE SCEPTICS CC_BY:NC:SA
  • 46. Research Question: Which climate change issue actors mention the skeptics, and what kinds of actors are more likely to mention them? Method: Comparative Query: skeptics in three source sets (‘top’ sources, climate change blogs and climate change science network), outputting source cloud for each.
  • 47. Source Sets: (1) Top ten Google returns for “climate change” (mix of media as well as governmental organizations)
  • 48. Source Sets: (2) Climate change blogs network (IssueCrawler results - mix of blogs, social media, traditional media and governmental and non-governmental organizations)
  • 49. Source Sets: (3) Climate change science network (IssueCrawler results - governmental, non- governmental, educational and media organizations)