OpenCorporatesCo-Director MappingTony HirstDept of Communications and Systems,The Open University
As company filings start to appear as open data, opportunitiesmay arise for watchdogs to start mining this data in support...
SocialMediaMappingIntroducing“Graphs”
To introduce the idea of a network map, let’s have a look at aview we can construct over the Twitter social space…
EmergentSocialPositioning
This network maps shows Twitter users who are commonlyfollowed by the followers of @TOGYnewsAlthough hard to see at this s...
ABIs followed byfocusFindthefollowers
The map is constructed using data grabbed from the TwitterAPI.Using one or more “focus” users (a specific Twitter account,...
AB peerpeerIs followed byfocusFindFriendsofFollowers
For each of the followers, we grab a list of their friends (or asample thereof) – that is, a lists of some or all of the p...
ABpeerIs followed byfocusFindCommonFriendsofFollowers
Drawing on the insight that people on Twitter are likely tofollow accounts that are of interest to them, we can start toim...
peerfocusFilteroutnotcommonlyfollowed
Having laid out the network map, we might now tidy it up alittle by removing all the nodes that are not themselvesfollowed...
EmergentSocialPositioning
The result is a map that shows groups of people positionedaccording to the shared projected presumed interests of theirfol...
AMorePrincipledApproach
It may also be possible to use metadata associated with socialnetworks to develop additional insights.A recent paper descr...
CorporateStructureMapsIntroducing“Graphs”
A more principled way of looking at corporate structures at acompany level may possibly be derived from publicly available...
C3C1C2D1D3D2Companies&Directors
For example, if we can get hold of directorial appointment andtermination data, we can start to construct maps that who ho...
CompanyRecordsonOpenCorporates
One possible source of open company information isOpenCorporates.OpenCorporates’ ambitious aim is to mint a unique corpora...
SubsidiaryCompanieshave“working”directors
Just a note – my experience in looking at data related to GBregistered companies suggests that the directors of the“top”/n...
Co-DirectorMappingMoreGraphs
We can reuse the ideas that underpin the construction of theemergent social positioning graph to map out corporatestructur...
DirectorRecordsonOpenCorporates
As well as corporate information pages, OpenCorporatesmaintains information pages about directorial appointments.At the mo...
StartWithOneorMoreSeedCompany
So how might we go about constructing a corporate sprawl?Let’s start with one or more seed company.
C1D1Has directorD2FindFriendsofFollowers
The general shape of this diagram might remind you ofsomething…?For each of the seed companies, we grab a list of theirdir...
FindDirectorsofSeedCompany(s)
Here’s another way of imagining it – a company surrounded byits directors.
C1C2Has directorD2FindFriendsofFollowersD1
For each of the directors, we run a search for them onOpenCorporates, to see what directorial appointments havebeen made t...
FindCompaniesWithTwoorMoreSeedDirectors
We now have a larger set of companies, reflecting thosecompanies who share N or more directors with the originalseed compa...
C1C2 D3D1Has directorD2FindFriendsofFollowers
If we so decide, we can continue with this snowball discoveryprocess, looking up further directors associated withcompanie...
Using this snowball approach, I have constructed a scraper onScraperwiki that mines OpenCorporates, given one or moreseed ...
C3C1C2D1D3D2Companies&Directors
So – we can generate a network that connects companies withtheir directors, and grow this network out to identifycompanies...
C3C1C2Companies
We can view this network in various ways. For example, wemight choose to view just the companies.
PageRank
This map shows companies in a corporate sprawl grown outfrom Royal Dutch Shell.Note the presence of BP in there – somehow,...
C3C1C2D1D3D2Companies&Directors
One of the nice things about representing this sort of structurein an abstract mathematical or computational way is that w...
C3C1C2CompaniesSharingDirectors
We can represent this by transforming the original bipartite(two types of node) graph that connects directors tocompanies ...
C3C2CompaniesSharingTwoorMoreDirectors
We can also filter the graph, for example by adding togetherthe weights of all the edges incident on a node, and throwinga...
PageRank
Here’s an example of the Shell corporate sprawl with thedirectors removed and edges connecting companies that sharetwo or ...
Betweenness
In this view, I have resized the labels based on thebetweenness centrality of each node. This network statistichighlights ...
Betweenness(repositioned)
This is just a tweaking of the layout of the previous graph to tryto highlight the separation of the different clusters.
C3C1C2D1D3D2Companies&Directors
Just as we collapsed the network to show how companiescould be linked directly by virtue of co-directorships, so we cancol...
D1D3D2Co-Directors
Once again, we use line thickness (that is, edge weight) todenote how heavily connected directors are.
PageRank
Here’s a view over connected directors in the the Shellcorporate sprawl.
OpenCorporatesScraperwiki dbJSOND3.jsNetworkxGexfGephi sigma.js
As to how we get those graphs plotted? I built a crudeworkflow in Scraperwiki that gets data out of the scrapeddatabase an...
This is Gephi – a cross-platform desktop tool that’s great forgenerating effective network visualisations. I have sometuto...
“Where”Next…?-geocode registered addresses- explore non-gb registered companies
So where can we take the OpenCorporates data next?I have a couple of ideas:- we can go spatial in a geographical sense and...
And“When”?- company timelines (set-up dates, renaming)- explore director timelines (by company)- explore director timeline...
Another approach might be to start analysing corporatesprawls in a time dimension. There are several opportunitieshere:- I...
Linking outand in- linking companies or directors with externaldatasets
Whilst it is possible to generate insight from the analysis ofdata that is contained just within OpenCorporates, there are...
SankeyFlowDiagrams
As this example starts to explore, we might try to reconcilecompany names as recorded in local spending data recordswith c...
What doyou think?
[ This is part of an ongoing informal exploration of the patternsand structures we can find across large open datasets.For...
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Upcoming SlideShare
Loading in …5
×

Mapping Corporate Networks With OpenCorporates

3,796 views

Published on

Published in: Technology
  • Be the first to comment

Mapping Corporate Networks With OpenCorporates

  1. 1. OpenCorporatesCo-Director MappingTony HirstDept of Communications and Systems,The Open University
  2. 2. As company filings start to appear as open data, opportunitiesmay arise for watchdogs to start mining this data in support oftheir investigations and monitoring activities.This presentation introduces several ideas relating to mappingnetwork structures in order to learn something about thestructure of “corporate sprawls”, corporate groupings definedon the basis of co-director relationships.
  3. 3. SocialMediaMappingIntroducing“Graphs”
  4. 4. To introduce the idea of a network map, let’s have a look at aview we can construct over the Twitter social space…
  5. 5. EmergentSocialPositioning
  6. 6. This network maps shows Twitter users who are commonlyfollowed by the followers of @TOGYnewsAlthough hard to see at this scale, the map is actuallyconstructed from labeled points connected by lines (in thejargon, “nodes connected by edges”).The algorithm used to position the labeled nodes tries to placenodes that are heavily connected to each other close to eachother. In a sense, we can view the diagram as a map, withregions that are highlighted using false colours identifyingclusters of nodes that may in some sense be similar to eachother based on the sharing of common followers.
  7. 7. ABIs followed byfocusFindthefollowers
  8. 8. The map is constructed using data grabbed from the TwitterAPI.Using one or more “focus” users (a specific Twitter account, forexample, or the set of users of a particular hashtag), we grab alist of their followers.
  9. 9. AB peerpeerIs followed byfocusFindFriendsofFollowers
  10. 10. For each of the followers, we grab a list of their friends (or asample thereof) – that is, a lists of some or all of the peoplethey follow on Twitter.We can use this data to construct a network of people followedby the followers of the original focus.It is typically at this point, where there is most relationalinformation contained within the network, that we lay it outusing automatic layout tools.
  11. 11. ABpeerIs followed byfocusFindCommonFriendsofFollowers
  12. 12. Drawing on the insight that people on Twitter are likely tofollow accounts that are of interest to them, we can start toimagine the network as a projection of the interests of thepeople who are interested in one or more of the things thefocus is associated with.However, interests of followers may spread to a wide range oftopics, so we look for consistency of interest, pruning thenetwork to remove people who are not commonly followed bythe followers of the focus. That is, we remove nodes who arefollowed by only a few of the followers of the focus.
  13. 13. peerfocusFilteroutnotcommonlyfollowed
  14. 14. Having laid out the network map, we might now tidy it up alittle by removing all the nodes that are not themselvesfollowed by a significant number of the followers of theoriginal focus,
  15. 15. EmergentSocialPositioning
  16. 16. The result is a map that shows groups of people positionedaccording to the shared projected presumed interests of theirfollowers.
  17. 17. AMorePrincipledApproach
  18. 18. It may also be possible to use metadata associated with socialnetworks to develop additional insights.A recent paper describes one way of mining social networkdata for information about people working for a particularcompany, and using public biographical information along withsocial connection data to map out the organisational structuresof large companies.
  19. 19. CorporateStructureMapsIntroducing“Graphs”
  20. 20. A more principled way of looking at corporate structures at acompany level may possibly be derived from publicly availablecorporate information.
  21. 21. C3C1C2D1D3D2Companies&Directors
  22. 22. For example, if we can get hold of directorial appointment andtermination data, we can start to construct maps that who howcompanies are connected by common directors, as well aswhich companies are co-directed by particular directors.As with the emergent social positioning network maps, ifparticular directors have particular corporate interests, we maybe able to identify particular organisational groupings incorporate sprawls made up from dozens of operatingcompanies working across a range of business areas.
  23. 23. CompanyRecordsonOpenCorporates
  24. 24. One possible source of open company information isOpenCorporates.OpenCorporates’ ambitious aim is to mint a unique corporateidentifier for every corporate legal entity in the world [CHECK],as well as collating, and normalising (or “harmonising”)company information about company filings, trademarks,patents(?) and officers (that is company directors, companysecretaries and so on).For GB registered companies, there is a growing repository ofdata relating to company directorships, which provides us withan opportunity to develop maps that show how companies areconnected by virtue of having common directors.
  25. 25. SubsidiaryCompanieshave“working”directors
  26. 26. Just a note – my experience in looking at data related to GBregistered companies suggests that the directors of the“top”/nominal company in a large multinational grouping are“atypical” compared to the officers appointed to UK basedoperating companies in the same corporate sprawl, beingappointed from the great and the good, or from senior officerswho do not take directorships across operating divisions orcompanies, rather than representing directors of operatingcompanies.When seeding corporate sprawl trawlers – algorithms that tryto identify companies that make up a corporate sprawl basedon co-directorships – my experience suggests that it oftenmakes sense to see the search with one or more operatingcompanies who have directors that are likely to be directors ofother operating companies, rather than the “top level”company.
  27. 27. Co-DirectorMappingMoreGraphs
  28. 28. We can reuse the ideas that underpin the construction of theemergent social positioning graph to map out corporatestructures based on director information.
  29. 29. DirectorRecordsonOpenCorporates
  30. 30. As well as corporate information pages, OpenCorporatesmaintains information pages about directorial appointments.At the moment, there are no authority files providingidentifiers that identify the same physical person – eachdirectorial appointment to company provides the director witha unique officer ID. It is possible to search for officers of othercompanies with the same name as a particular director, but noidentifiers that link them as the same physical person. (Thatsaid, there does appear to be a slot in the metadata forauthoritative identifiers.)
  31. 31. StartWithOneorMoreSeedCompany
  32. 32. So how might we go about constructing a corporate sprawl?Let’s start with one or more seed company.
  33. 33. C1D1Has directorD2FindFriendsofFollowers
  34. 34. The general shape of this diagram might remind you ofsomething…?For each of the seed companies, we grab a list of theirdirectors.We can use this data to construct a network of people who aredirectors or other officers of the original seed company orcompanies.
  35. 35. FindDirectorsofSeedCompany(s)
  36. 36. Here’s another way of imagining it – a company surrounded byits directors.
  37. 37. C1C2Has directorD2FindFriendsofFollowersD1
  38. 38. For each of the directors, we run a search for them onOpenCorporates, to see what directorial appointments havebeen made to other companies for people of exactly the samename.We can use this data to construct a network of companiesdirected by the directors of the original seed company.For those companies that are directed by N or more of thedirectors associated with the seed company or companies(where N is typically 2) we might now say they are part of thecorporate sprawl. The companies sharing fewer than Ndirectors associated with companies admitted to the corporatesprawl are added to a list of possible candidate companies. Aswe find more directors associated with companies included inthe sprawl, we might be able to “legitimise” membership ofthese companies within the sprawl.
  39. 39. FindCompaniesWithTwoorMoreSeedDirectors
  40. 40. We now have a larger set of companies, reflecting thosecompanies who share N or more directors with the originalseed company or companies.
  41. 41. C1C2 D3D1Has directorD2FindFriendsofFollowers
  42. 42. If we so decide, we can continue with this snowball discoveryprocess, looking up further directors associated withcompanies we have included in our sprawl, with a view totrying to discover more companies that should be included inthe sprawl.
  43. 43. Using this snowball approach, I have constructed a scraper onScraperwiki that mines OpenCorporates, given one or moreseed companies (or seed directors) to map out corporatesprawls, limiting myself to the capture of current directors andactive companies registered in the UK.(The code needs checking and is perhaps not as easy to use asit might be. Developing a more robust and user friendly toolmay be worth exploring if this approach is seen to be useful.)
  44. 44. C3C1C2D1D3D2Companies&Directors
  45. 45. So – we can generate a network that connects companies withtheir directors, and grow this network out to identifycompanies that share several directors.As with the emergent social positioning map, we can useautomatic layout tools to try to position companies anddirectors close to each other based on their connectivity,producing a map over the corporate sprawl.
  46. 46. C3C1C2Companies
  47. 47. We can view this network in various ways. For example, wemight choose to view just the companies.
  48. 48. PageRank
  49. 49. This map shows companies in a corporate sprawl grown outfrom Royal Dutch Shell.Note the presence of BP in there – somehow, these twogroupings are connected by shared directorships of someintermediate company.
  50. 50. C3C1C2D1D3D2Companies&Directors
  51. 51. One of the nice things about representing this sort of structurein an abstract mathematical or computational way is that wecan wrangle it with code...So for example, companies C1 and C2 are connected by asingle shared director, whereas C2 and C3 are connected bytwo directors.
  52. 52. C3C1C2CompaniesSharingDirectors
  53. 53. We can represent this by transforming the original bipartite(two types of node) graph that connects directors tocompanies and companies to directors by a graph that justconnects companies who were connected by directors.The thickness of the line (or “edge”) connecting the companiesrepresents its “weight”, which in this case is given by thenumber of shared directors between connected companies.
  54. 54. C3C2CompaniesSharingTwoorMoreDirectors
  55. 55. We can also filter the graph, for example by adding togetherthe weights of all the edges incident on a node, and throwingaway all nodes for whom this sum is below a specifiedthreshold value.We might alternatively prune the network by removing(“cutting”) all edges below a specified weight, and thenthrowing away nodes that aren’t connected to other nodes.(For example, we might remove connections betweencompanies that only share a single director, and then throwaway companies that aren’t connected to any othercompanies. Which is to say, we cut out companies that don’tshare two or more directors with any other single company.When you start working with graphs, you begin to realise quitehow beautiful, and powerful, a way they are for working dataelements that are related to each other in some way.)
  56. 56. PageRank
  57. 57. Here’s an example of the Shell corporate sprawl with thedirectors removed and edges connecting companies that sharetwo or more directors. The labels are sized relative to thePageRank score of each node, which a measure of how wellconnected the node is in the graph (the “importance” of eachnode is dependent on the “importance” of the nodesconnected to it….)The lines also provide a background that highlights theconnectivity - and structure – of the corporate elements.
  58. 58. Betweenness
  59. 59. In this view, I have resized the labels based on thebetweenness centrality of each node. This network statistichighlights nodes that play an important role in connectingclusters or groupings of nodes. So for example, we see thesuggestion that The Consolidated Petroleum Company andShell Mex and BP Limited may be the companies that connectthe Shell sprawl to the BP one.
  60. 60. Betweenness(repositioned)
  61. 61. This is just a tweaking of the layout of the previous graph to tryto highlight the separation of the different clusters.
  62. 62. C3C1C2D1D3D2Companies&Directors
  63. 63. Just as we collapsed the network to show how companiescould be linked directly by virtue of co-directorships, so we cancollapse the network to show how directors are connected.For example, director D1 is connected by a single sharedcompany to directors D2 and D3, whereas D2 and D3 areconnected by two companies.
  64. 64. D1D3D2Co-Directors
  65. 65. Once again, we use line thickness (that is, edge weight) todenote how heavily connected directors are.
  66. 66. PageRank
  67. 67. Here’s a view over connected directors in the the Shellcorporate sprawl.
  68. 68. OpenCorporatesScraperwiki dbJSOND3.jsNetworkxGexfGephi sigma.js
  69. 69. As to how we get those graphs plotted? I built a crudeworkflow in Scraperwiki that gets data out of the scrapeddatabase and into a form that allows it to be visualised usingthe Gephi desktop tool or in a web page using differentJavascript libraries (sigma.js or d3.js).
  70. 70. This is Gephi – a cross-platform desktop tool that’s great forgenerating effective network visualisations. I have sometutorials and sample datasets if anyone wants to give it awhirl…
  71. 71. “Where”Next…?-geocode registered addresses- explore non-gb registered companies
  72. 72. So where can we take the OpenCorporates data next?I have a couple of ideas:- we can go spatial in a geographical sense and start togeocode the registered addresses of companies, to seewhether any of them are located in offshore tax havens, forexample, or to see whether there are different registeredaddresses that might lead us to yet more companies (by virtueof sharing common registered office addresses, rather than co-directors, for example);- we could start trying to tie non-gb registered companies intothe mix. At the moment, director information for otherterritories is sparse – might them be some other way we canlook for connections?
  73. 73. And“When”?- company timelines (set-up dates, renaming)- explore director timelines (by company)- explore director timelines (by directory)
  74. 74. Another approach might be to start analysing corporatesprawls in a time dimension. There are several opportunitieshere:- If we have access to company formation and dissolutiondates, we can map out a timeiline of a corporate sprawl, whichmight reveal how companies change name, directorship orassociation with other companies;- if we get all the director information associated with acompany, we can visualise how director appointments andterminations occurred across one or more companies, whichmight in turn reveal identifiable “features” that we might beable to associate with news or business restructuing events;- if we track down companies a particular director appears tobe associated with, we can start to develop “career timelines”of directors, showing how they have been associated withdifferent corporate groupings over time (and maybe the oddcompany on the side…)
  75. 75. Linking outand in- linking companies or directors with externaldatasets
  76. 76. Whilst it is possible to generate insight from the analysis ofdata that is contained just within OpenCorporates, there arelikely to be many opportunities for using OpenCroporates toannotate other datasets, or use external datasets to annotateOpenCorporates data
  77. 77. SankeyFlowDiagrams
  78. 78. As this example starts to explore, we might try to reconcilecompany names as recorded in local spending data recordswith corporate entities identified within in OpenCorporates tobuild up a better picture of how money flows into corporatesprawls.On a lobbying front, we might look for mentions of meetingsbetween government officials and and company officers, andthen try to make mappings between government departmentsand operational areas of a corporate sprawl, and so on.
  79. 79. What doyou think?
  80. 80. [ This is part of an ongoing informal exploration of the patternsand structures we can find across large open datasets.For more information, follow:- blog.ouseful.info- @psychemediaAll comments welcome. ]

×