SlideShare a Scribd company logo
1 of 32
Mapping Corporate
Networks - Intro
A two-part recipe for
downloading company
ownership data from
OpenCorporatesusing
OpenRefine, and then
visualising it with Gephi
http://opencorporates.com/companies/gb/04366849/network.json?depth=2
How to grab the
data using
OpenRefine
Visit openrefine.org to download the application
Where’s the data?

Add
/network.json?depth=2
to the end of the web address
URL of the form:

http://opencorporates.com/companies/
JURISDICTION/COMPANY_ID/network.json?de
pth=2
What data block makes a row?
Create project

Toggle selection and preview
Nicely tabulated data
What Gephi Expects…
Child

Parent
Parent ->Source
Child ->Target
(You may find the network analyses work better
if you use the parent as the Target and the
child as the Source…)

What Gephi Expects…
How tovisualise
the data
usingGephi
Visitgephi.org to download the application
Getting Started with Gephi
Import as Edges table
Colour/S
ize

View

Stats/
Filters

Layout
Label tools
“Spacing”
Label display
selector

Turn labels on

Label size
A matter of degree…

Degree 2
In-degree 2
Out-degree 0

Degree 3
Degree 3
In-degree 0
In-degree 1
Out-degree 3 Out-degree 2
Size by degree…
Calculate in-degree
and out-degree

Set node size

The color wheel/palette is
used to colourthe nodes.
Label Sizing
Tweaking the layout
“Expand” the layout
(stretch it in two
dimensions)

“Adjust” the labels
so that they don’t
overlap - may
change relative
position of nodes
Network Stats

HITS – Authority and Hub values:
authoritative nodes are pointed to,
hub nodes point to others

Measure the ‘influence’
of a node in the network
Use the tools in concert…
Colour based on Authority (HITS statistic)

Label adjust tweaks the layout
so we can read the labels

Fine tune label sizing
using text-size slider
SchoolOfData.org

More Related Content

Similar to School Of Data - mapping opencorporates networks using openrefine and Gephi

OGD - Jeff Walpole
OGD - Jeff WalpoleOGD - Jeff Walpole
OGD - Jeff WalpoleAcquia
 
Phishing Website Detection Paradigm using XGBoost
Phishing Website Detection Paradigm using XGBoostPhishing Website Detection Paradigm using XGBoost
Phishing Website Detection Paradigm using XGBoostIRJET Journal
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
 
How to Collect and Process Data Under GDPR?
How to Collect and Process Data Under GDPR?How to Collect and Process Data Under GDPR?
How to Collect and Process Data Under GDPR?Piwik PRO
 
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Steven Meister
 
Top 5 Tips for Building Viral Social Web Applications and Sites
Top 5 Tips for Building Viral Social Web Applications and SitesTop 5 Tips for Building Viral Social Web Applications and Sites
Top 5 Tips for Building Viral Social Web Applications and SitesJonathan LeBlanc
 
Data Sharing Between Child and Parent Components in AngularJS
Data Sharing Between Child and Parent Components in AngularJSData Sharing Between Child and Parent Components in AngularJS
Data Sharing Between Child and Parent Components in AngularJSFibonalabs
 
Osgis2011 edina addy_pope
Osgis2011 edina addy_popeOsgis2011 edina addy_pope
Osgis2011 edina addy_popeAddy Pope
 
Osgis2011 edina addy_pope
Osgis2011 edina addy_popeOsgis2011 edina addy_pope
Osgis2011 edina addy_popeAddy Pope
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data CenterAbe Usher
 
Information On Line Transaction Processing
Information On Line Transaction ProcessingInformation On Line Transaction Processing
Information On Line Transaction ProcessingStefanie Yang
 
Structured Data and Semantic SEO
Structured Data and Semantic SEOStructured Data and Semantic SEO
Structured Data and Semantic SEOMatthew Brown
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated informationEric Sieverts
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Hive at LinkedIn
Hive at LinkedIn Hive at LinkedIn
Hive at LinkedIn mislam77
 

Similar to School Of Data - mapping opencorporates networks using openrefine and Gephi (20)

OGD - Jeff Walpole
OGD - Jeff WalpoleOGD - Jeff Walpole
OGD - Jeff Walpole
 
Phishing Website Detection Paradigm using XGBoost
Phishing Website Detection Paradigm using XGBoostPhishing Website Detection Paradigm using XGBoost
Phishing Website Detection Paradigm using XGBoost
 
Data Extraction.pdf
Data Extraction.pdfData Extraction.pdf
Data Extraction.pdf
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
How to Collect and Process Data Under GDPR?
How to Collect and Process Data Under GDPR?How to Collect and Process Data Under GDPR?
How to Collect and Process Data Under GDPR?
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
 
Top 5 Tips for Building Viral Social Web Applications and Sites
Top 5 Tips for Building Viral Social Web Applications and SitesTop 5 Tips for Building Viral Social Web Applications and Sites
Top 5 Tips for Building Viral Social Web Applications and Sites
 
Data Sharing Between Child and Parent Components in AngularJS
Data Sharing Between Child and Parent Components in AngularJSData Sharing Between Child and Parent Components in AngularJS
Data Sharing Between Child and Parent Components in AngularJS
 
Osgis2011 edina addy_pope
Osgis2011 edina addy_popeOsgis2011 edina addy_pope
Osgis2011 edina addy_pope
 
Osgis2011 edina addy_pope
Osgis2011 edina addy_popeOsgis2011 edina addy_pope
Osgis2011 edina addy_pope
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data Center
 
Information On Line Transaction Processing
Information On Line Transaction ProcessingInformation On Line Transaction Processing
Information On Line Transaction Processing
 
DotNetnuke
DotNetnukeDotNetnuke
DotNetnuke
 
App Tracking Transparancy.docx
App Tracking Transparancy.docxApp Tracking Transparancy.docx
App Tracking Transparancy.docx
 
CapstoneFinal
CapstoneFinalCapstoneFinal
CapstoneFinal
 
Structured Data and Semantic SEO
Structured Data and Semantic SEOStructured Data and Semantic SEO
Structured Data and Semantic SEO
 
Project Panorama: vistas on validated information
Project Panorama: vistas on validated informationProject Panorama: vistas on validated information
Project Panorama: vistas on validated information
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Hive at LinkedIn
Hive at LinkedIn Hive at LinkedIn
Hive at LinkedIn
 

More from Tony Hirst

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiestaTony Hirst
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptxTony Hirst
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptxTony Hirst
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacksTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriateTony Hirst
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyterTony Hirst
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2Tony Hirst
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopTony Hirst
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireTony Hirst
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interestTony Hirst
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXTony Hirst
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefineTony Hirst
 
Conversations with data
Conversations with dataConversations with data
Conversations with dataTony Hirst
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingoTony Hirst
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Tony Hirst
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalismTony Hirst
 

More from Tony Hirst (20)

15 in 20 research fiesta
15 in 20 research fiesta15 in 20 research fiesta
15 in 20 research fiesta
 
Dev8d jupyter
Dev8d jupyterDev8d jupyter
Dev8d jupyter
 
Ili 16 robot
Ili 16 robotIli 16 robot
Ili 16 robot
 
Jupyternotebooks ou.pptx
Jupyternotebooks ou.pptxJupyternotebooks ou.pptx
Jupyternotebooks ou.pptx
 
Virtual computing.pptx
Virtual computing.pptxVirtual computing.pptx
Virtual computing.pptx
 
ouseful-parlihacks
ouseful-parlihacksouseful-parlihacks
ouseful-parlihacks
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Gors appropriate
Gors appropriateGors appropriate
Gors appropriate
 
Robotlab jupyter
Robotlab   jupyterRobotlab   jupyter
Robotlab jupyter
 
Fco open data in half day th-v2
Fco open data in half day  th-v2Fco open data in half day  th-v2
Fco open data in half day th-v2
 
Notes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 WorkshopNotes on the Future - ILI2015 Workshop
Notes on the Future - ILI2015 Workshop
 
Community Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wireCommunity Journalism Conf - hyperlocal data wire
Community Journalism Conf - hyperlocal data wire
 
Residential school 2015_robotics_interest
Residential school 2015_robotics_interestResidential school 2015_robotics_interest
Residential school 2015_robotics_interest
 
Data Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKXData Mining - Separating Fact From Fiction - NetIKX
Data Mining - Separating Fact From Fiction - NetIKX
 
Week4
Week4Week4
Week4
 
A Quick Tour of OpenRefine
A Quick Tour of OpenRefineA Quick Tour of OpenRefine
A Quick Tour of OpenRefine
 
Conversations with data
Conversations with dataConversations with data
Conversations with data
 
Data reuse OU workshop bingo
Data reuse OU workshop bingoData reuse OU workshop bingo
Data reuse OU workshop bingo
 
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
 
Lincoln jun14datajournalism
Lincoln jun14datajournalismLincoln jun14datajournalism
Lincoln jun14datajournalism
 

Recently uploaded

Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdfOrient Homes
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Timedelhimodelshub1
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
NewBase 22 April 2024 Energy News issue - 1718 by Khaled Al Awadi (AutoRe...
NewBase  22 April  2024  Energy News issue - 1718 by Khaled Al Awadi  (AutoRe...NewBase  22 April  2024  Energy News issue - 1718 by Khaled Al Awadi  (AutoRe...
NewBase 22 April 2024 Energy News issue - 1718 by Khaled Al Awadi (AutoRe...Khaled Al Awadi
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service DewasVip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewasmakika9823
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailAriel592675
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
Non Text Magic Studio Magic Design for Presentations L&P.pptx
Non Text Magic Studio Magic Design for Presentations L&P.pptxNon Text Magic Studio Magic Design for Presentations L&P.pptx
Non Text Magic Studio Magic Design for Presentations L&P.pptxAbhayThakur200703
 
(8264348440) 🔝 Call Girls In Keshav Puram 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Keshav Puram 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Keshav Puram 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Keshav Puram 🔝 Delhi NCRsoniya singh
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...lizamodels9
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
FULL ENJOY - 9953040155 Call Girls in Chhatarpur | Delhi
FULL ENJOY - 9953040155 Call Girls in Chhatarpur | DelhiFULL ENJOY - 9953040155 Call Girls in Chhatarpur | Delhi
FULL ENJOY - 9953040155 Call Girls in Chhatarpur | DelhiMalviyaNagarCallGirl
 
Marketing Management Business Plan_My Sweet Creations
Marketing Management Business Plan_My Sweet CreationsMarketing Management Business Plan_My Sweet Creations
Marketing Management Business Plan_My Sweet Creationsnakalysalcedo61
 
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCRsoniya singh
 

Recently uploaded (20)

Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdf
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Time
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Old Faridabad ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
NewBase 22 April 2024 Energy News issue - 1718 by Khaled Al Awadi (AutoRe...
NewBase  22 April  2024  Energy News issue - 1718 by Khaled Al Awadi  (AutoRe...NewBase  22 April  2024  Energy News issue - 1718 by Khaled Al Awadi  (AutoRe...
NewBase 22 April 2024 Energy News issue - 1718 by Khaled Al Awadi (AutoRe...
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In BELLMONT HOTEL ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service DewasVip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
Vip Dewas Call Girls #9907093804 Contact Number Escorts Service Dewas
 
Case study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detailCase study on tata clothing brand zudio in detail
Case study on tata clothing brand zudio in detail
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
Non Text Magic Studio Magic Design for Presentations L&P.pptx
Non Text Magic Studio Magic Design for Presentations L&P.pptxNon Text Magic Studio Magic Design for Presentations L&P.pptx
Non Text Magic Studio Magic Design for Presentations L&P.pptx
 
(8264348440) 🔝 Call Girls In Keshav Puram 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Keshav Puram 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Keshav Puram 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Keshav Puram 🔝 Delhi NCR
 
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In.../:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
/:Call Girls In Indirapuram Ghaziabad ➥9990211544 Independent Best Escorts In...
 
KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)KestrelPro Flyer Japan IT Week 2024 (English)
KestrelPro Flyer Japan IT Week 2024 (English)
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
FULL ENJOY - 9953040155 Call Girls in Chhatarpur | Delhi
FULL ENJOY - 9953040155 Call Girls in Chhatarpur | DelhiFULL ENJOY - 9953040155 Call Girls in Chhatarpur | Delhi
FULL ENJOY - 9953040155 Call Girls in Chhatarpur | Delhi
 
Marketing Management Business Plan_My Sweet Creations
Marketing Management Business Plan_My Sweet CreationsMarketing Management Business Plan_My Sweet Creations
Marketing Management Business Plan_My Sweet Creations
 
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
(8264348440) 🔝 Call Girls In Hauz Khas 🔝 Delhi NCR
 

School Of Data - mapping opencorporates networks using openrefine and Gephi

Editor's Notes

  1. Some slide prompts to support a data framing investigation around corporate data – originally prepared for the OGP Festival, London, October 2013.For more information, contact: schoolOfData.org
  2. These notes provide a worked example of how to download company ownership relationship data from OpenCorproates (opencorporates.com) using the cross-platform data cleaning tool OpenRefine (openrefine.org), and then visualise the data using the cross-platform Gephinetwrokvisualisation tool (gephi.org).
  3. OpenCorporates is a private company that has set itself the ambitious task of building a database of registered company information for every legal corporate entity in the world.One of the views OpenCorporates offers over at least some of the data in its database shows how companies are connected by beneficial ownership or shareholder relationships.Although complex, this diagram is “human readable” – the data is presented in a way that is intended to make some sort of meaningful sense to us.
  4. But as well as publishing data for us humans to read, OpenCorporates also makes data available in a way that machines can read - machine readable data.You may have heard of the term “API” in the context of data publishing websites. To all intents and purposes, an API is an interface that computers can use to get information out of websites in a way that they, and the databases they work with, can understand.The data is published in a format known as JSON – Javascript Object Notation. But you don’t really need to know much more than that – just that it’s called JSON, and tools that can parse and work with JSON can parse and work with the data that the OpenCorporates API publishes.
  5. If you aren’t a programmer, here’s way of getting the data out of OpenCorporates and into a tabular form you may be more comfortable with, and which we can use to generate a network diagram to display in a tool such as Gephi…You can download the OpenRefine application from openrefine.org. When you run it on your computer, it will launch an application that runs inside a browser tab using your default web browser.
  6. We can get company ownership (subsidiary relations, major shareholdings, etc) from OpenCorporates by hacking the web address/URL of a company page on OpenCorporates.From a company page on OpenCorporates, which should have the form:http://opencorporates.com/companies/JURISDICTION/COMPANY_IDadd the following to the end of the web address/URL:/network.json?depth=2to give something with the following form:http://opencorporates.com/companies/JURISDICTION/COMPANY_ID/network.json?depth=2(Note: company network data may not be available in all jurisdictions or for all companies.)
  7. In OpenRefine, select the option to Create [a new] Project using the web address – or URL – to the JSON data pagethat reveals the data relating to the corporate ownership network of the company we are interested in on OpenCorporates.Note that you can import data into OpenRefine from several web addresses all in one go, though the data returned from each URL should have the same format or structure.Using multiple URLs results in a combined data set, which can be quite handy.
  8. Being machine readable, the data makes more sense to OpenRefine than it probably does to us! Select a block of data in the preview view that is typical of a set of data that you want to map into a single row in a “traditional” spreadsheet like view.Data blocks are typically contained within braces (curly brackets); these things : { }Note that in some machine readable data, some data blocks may be contained within other data blocks…Each of the items in a single data block can be mapped into a separate cell – that is, a separate column – in a single row of data.So each data block is a row, and each item in the block is a column…. OpenRefine will give you a preview of how the data will look if you click the right button!
  9. You can preview the effect of making particular block selections using Update Preview.To return to the block highlighter, use ‘Pick Record Nodes’.When you are happy with your selection, you are ready to “Create Project”.
  10. Once we’re happy with the data preview, we can import the data into a more familiar looking layout.The arrows at the top of each column pop up menus that allow us to run a wide variety of operations on a column.One of the operations let’s us change the column name, so I’m going to rename the child company and parent company columns to what Gephi expects: Source and Target.
  11. This is the format that Gephi wants to see when we import data from a simple two column, comma separated variable (CSV) text file.One of the columns needs to be called Source, another needs to be called Target. When constructing the network diagram, Gephi then knows to draw a line going from each Source element to the corresponding Target.
  12. TheOpenCorporates network data in tabulated form. The default column names are not necessarily as human readable as they could be!In particular, we can identify the name of the parent company and the child company for each ownership relation. We also have access to the OpenCorporates IDs for all of those companies. The type of relationship between the companies is also described. For the moment, we will treat them all equally.(If you want to view just those company connections that relate to a particular type of relation, use the Facet or Text Filter tool applied to the appropriate column.)
  13. From the appropriate column menu, select “Edit Column” and then “Rename this column” to change the column name.
  14. We can now export the data using the Custom Tabular Exporter.Deselect all the columns then select just the Source and Target columns – we will only export data from these two columns.
  15. Preview your data to check that it looks like the sort of data you expect to export.From the Download tab, select the CSV output type and export your data – it should be saved into the default download directory used by your browser, with a file name that corresponds to the OpenRefine project name.You should have the two column data saved to your computer that you can now load in to Gephi.
  16. Gephi is a powerful cross-platform desktop tool for visualising data that describes networks, such as social networks or corporate ownership networks. You can import data into Gephi using specialised graph/network representation formats, or from simple two column data files where each describes a simple connection between two elements (egthing1, thing2 would say that thing1 connects to thing2).You can download the Gephi application from gephi.org. When you run it on your computer, it will launch a desktop application. Note that Gephi requires Java – if you are on a Mac, you may need to download and install Java yourself: www.java.com
  17. LaunchGephi (download it from gephi.org if you don’t already have it installed) and select Data Laboratory.If the Data Table toolbar is empty, go to the application’s File menu and select ‘New Project’. A new project will be created and you should see several toolbar options appear in the Data Table.
  18. Load the data in using the “Import Spreadsheet” tool option. Make sure that you select Edges table as the table type.If your data file does not have Source and Target column names, an error will occur and you will not be able to import the data file. (In such a case, you could always open the file in a text editor, change the column names in the file, save it, and try again. Alternatively, go in to OpenRefine, change the column names there, and re-export the custom tabulated data…)
  19. The final stage of the import gives some additional information about how uploaded data will be treated.Because we are simply loading in data that describes how one company (identified by its name) is connected to another company (also identified by its name), we need to get Gephi to automatically create a node each time it sees a new company (as identified by its company name…).
  20. When the data is imported, we can preview it, either by looking at a list of nodes that have been created, or ‘edges’ – that is, connections between two companies.
  21. So now let’s see where we can start to view this data as a network visualisation.Click on the top palette Overview button to get an overview of the network in visual form. This is the area where we can interactively visualise the network.
  22. The default Overview layout has three main areas: in the middle is the canvas where we can see the current layout of the network; along the left hand side of the central panel are several tools for operating on the elements shown on the canvas; along the bottom of the central panel are several tools for controlling how text labels are displayed. to the left are several tools for manipulating what the network looks like: tools for laying out the network (that is, positioning the nodes) automatically, as well as colouring and sizing the nodes;- to the right are several tools that allow us to analyse and process the graph (that is, the mathematical structure that defines the network); for example, we can run various statistics on the network, or filter the nodes that are displayed according to one or more specified criteria.
  23. Let’s start by laying out the network. There are several layout tools provided by default (you can install more from the Tools->Pluginsmenu) which each have slightly different behaviours and can be differently effective at laying out networks with different sorts of structure.A couple of good all-round layout algorithms are: ForceAtlas2YifanHu.If you imagine connected nodes held together by springs, you can thing of these layout tools as trying to position the nodes so that the springs are stretched as little as possible. Sort of.
  24. At the moment, we don’t know what each node represents. By default, when labels are switched on, Gephi looks for a label column value associated with a node and displays that. But we can also display other values. In this case, we are using a company name as the node ID, so we can select id as the element to display when we switch labels on. Click on the clipboard icon on the toolbar at the bottom of the screen to raise the label selector.To actually switch labels on, click on the leftmost/darketT button on the toolbar at the bottom of the screen.The slider on the right controls the text label size.
  25. We can also change the size of labels proportional to the size of a node – but how do we size nodes?Whilst it is possible to load in data that describes various attributes associated with each node (for example, in the case of a company node it might be the turnover or profit in the last financial year), we can also generate information about each node based on various network properties.For example, the degree of a node says how many connections it has with other nodes. Where connections are ‘directed’ – that is, represented by arrows – the number of arrows that leave a node is referred to as the out-degree of the node, and the number of arrows that come into a node as the in-degree.
  26. We can use the Average Degree statistic tool to calculate the degree, in-degree and out-degree values for each node.We can then use these values as the basis for sizing the nodes in the network visualisation.
  27. Here we have sized the nodes by Degree. The min and max size parameters can be set as required to scale the size of the nodes.
  28. We can set the label size so that it is proportional to the node size – from the black/dark A label on the toolbar at the bottom of the screen, select the [proportional to] Node Size menu option.
  29. As well as tools for generating grandscale layouts, there are also layout tools for tweaking a particular layout.The Expansion tool just stretches (or shrinks) the layout in the x and y directions. This can be good for just putting a bit of space into a layout.The Label Adjust tool juggles nodes so that their labels don’t overlap. Note that this tool may move some nodes quite a distance compared to their neighbours and so may upset any meaningful spatial relationships obtained using the other layout tools.
  30. We can colour and size nodes according to a wide range of properties obtained from running various network statistics.As you work with network data more and more, you start to get a feel for which tools to use to help you look for particular patterns, structures and stories within the data. But that is a tutorial for another day…
  31. We can use various tools in concert to tweak the layout of the network.In this example, I have: sized the nodes by degree; set the label sizes proportional to the Degree; tweaked the scale using the text-size slide; used the Authority value (obtained via the HITS statistic) to colour the nodes; laid out the network using a ForceAtlas2 algorithm, a bit of Expansion and a dash of Label Adjust.
  32. If you want to know more, contact us…