Your SlideShare is downloading. ×
0
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Mapping Corporate Networks With OpenCorporates
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Mapping Corporate Networks With OpenCorporates

2,780

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,780
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • As company filings start to appear as open data, opportunities may arise for watchdogs to start mining this data in support of their investigations and monitoring activities.This presentation introduces several ideas relating to mapping network structures in order to learn something about the structure of “corporate sprawls”, corporate groupings defined on the basis of co-director relationships.
  • To introduce the idea of a network map, let’s have a look at a view we can construct over the Twitter social space…
  • This network maps shows Twitter users who are commonly followed by the followers of @TOGYnewsAlthough hard to see at this scale, the map is actually constructed from labeled points connected by lines (in the jargon, “nodes connected by edges”).The algorithm used to position the labeled nodes tries to place nodes that are heavily connected to each other close to each other. In a sense, we can view the diagram as a map, with regions that are highlighted using false colours identifying clusters of nodes that may in some sense be similar to each other based on the sharing of common followers.
  • The map is constructed using data grabbed from the Twitter API.Using one or more “focus” users (a specific Twitter account, for example, or the set of users of a particular hashtag), we grab a list of their followers.
  • For each of the followers, we grab a list of their friends (or a sample thereof) – that is, a lists of some or all of the people they follow on Twitter.We can use this data to construct a network of people followed by the followers of the original focus.It is typically at this point, where there is most relational information contained within the network, that we lay it out using automatic layout tools.
  • Drawing on the insight that people on Twitter are likely to follow accounts that are of interest to them, we can start to imagine the network as a projection of the interests of the people who are interested in one or more of the things the focus is associated with.However, interests of followers may spread to a wide range of topics, so we look for consistency of interest, pruning the network to remove people who are not commonly followed by the followers of the focus. That is, we remove nodes who are followed by only a few of the followers of the focus.
  • Having laid out the network map, we might now tidy it up a little by removing all the nodes that are not themselves followed by a significant number of the followers of the original focus,
  • The result is a map that shows groups of people positioned according to the shared projected presumed interests of their followers.
  • It may also be possible to use metadata associated with social networks to develop additional insights.A recent paper describes one way of mining social network data for information about people working for a particular company, and using public biographical information along with social connection data to map out the organisational structures of large companies.
  • A more principled way of looking at corporate structures at a company level may possibly be derived from publicly available corporate information.
  • For example, if we can get hold of directorial appointment and termination data, we can start to construct maps that who how companies are connected by common directors, as well as which companies are co-directed by particular directors.As with the emergent social positioning network maps, if particular directors have particular corporate interests, we may be able to identify particular organisational groupings in corporate sprawls made up from dozens of operating companies working across a range of business areas.
  • One possible source of open company information is OpenCorporates.OpenCorporates’ ambitious aim is to mint a unique corporate identifier for every corporate legal entity in the world [CHECK], as well as collating, and normalising (or “harmonising”) company information about company filings, trademarks, patents(?) and officers (that is company directors, company secretaries and so on).For GB registered companies, there is a growing repository of data relating to company directorships, which provides us with an opportunity to develop maps that show how companies are connected by virtue of having common directors.
  • Just a note – my experience in looking at data related to GB registered companies suggests that the directors of the “top”/nominal company in a large multinational grouping are “atypical” compared to the officers appointed to UK based operating companies in the same corporate sprawl, being appointed from the great and the good, or from senior officers who do not take directorships across operating divisions or companies, rather than representing directors of operating companies.When seeding corporate sprawl trawlers – algorithms that try to identify companies that make up a corporate sprawl based on co-directorships – my experience suggests that it often makes sense to see the search with one or more operating companies who have directors that are likely to be directors of other operating companies, rather than the “top level” company.
  • To introduce the idea of a network map, let’s have a look at a view we can construct over the Twitter social space…
  • As well as corporate information pages, OpenCorporates maintains information pages about directorial appointments. At the moment, there are no authority files providing identifiers that identify the same physical person – each directorial appointment to company provides the director with a unique officer ID. It is possible to search for officers of other companies with the same name as a particular director, but no identifiers that link them as the same physical person. (That said, there does appear to be a slot in the metadata for authoritative identifiers.)
  • So how might we go about constructing a corporate sprawl?Let’s start with one or more seed company.
  • The general shape of this diagram might remind you of something…?For each of the seed companies, we grab a list of their directors.We can use this data to construct a network of people who are directors or other officers of the original seed company or companies.
  • Here’s another way of imagining it – a company surrounded by its directors.
  • For each of the directors, we run a search for them on OpenCorporates, to see what directorial appointments have been made to other companies for people of exactly the same name.We can use this data to construct a network of companies directed by the directors of the original seed company.For those companies that are directed by N or more of the directors associated with the seed company or companies (where N is typically 2) we might now say they are part of the corporate sprawl. The companies sharing fewer than N directors associated with companies admitted to the corporate sprawl are added to a list of possible candidate companies. As we find more directors associated with companies included in the sprawl, we might be able to “legitimise” membership of these companies within the sprawl.
  • We now have a larger set of companies, reflecting those companies who share N or more directors with the original seed company or companies.
  • If we so decide, we can continue with this snowball discovery process, looking up further directors associated with companies we have included in our sprawl, with a view to trying to discover more companies that should be included in the sprawl.
  • Using this snowball approach, I have constructed a scraper on Scraperwiki that mines OpenCorporates, given one or more seed companies (or seed directors) to map out corporate sprawls, limiting myself to the capture of current directors and active companies registered in the UK.(The code needs checking and is perhaps not as easy to use as it might be. Developing a more robust and user friendly tool may be worth exploring if this approach is seen to be useful.)
  • So – we can generate a network that connects companies with their directors, and grow this network out to identify companies that share several directors.As with the emergent social positioning map, we can use automatic layout tools to try to position companies and directors close to each other based on their connectivity, producing a map over the corporate sprawl.
  • We can view this network in various ways. For example, we might choose to view just the companies.
  • This map shows companies in a corporate sprawl grown out from Royal Dutch Shell.Note the presence of BP in there – somehow, these two groupings are connected by shared directorships of some intermediate company.
  • One of the nice things about representing this sort of structure in an abstract mathematical or computational way is that we can wrangle it with code...So for example, companies C1 and C2 are connected by a single shared director, whereas C2 and C3 are connected by two directors.
  • We can represent this by transforming the original bipartite (two types of node) graph that connects directors to companies and companies to directors by a graph that just connects companies who were connected by directors.The thickness of the line (or “edge”) connecting the companies represents its “weight”, which in this case is given by the number of shared directors between connected companies.
  • We can also filter the graph, for example by adding together the weights of all the edges incident on a node, and throwing away all nodes for whom this sum is below a specified threshold value.We might alternatively prune the network by removing (“cutting”) all edges below a specified weight, and then throwing away nodes that aren’t connected to other nodes. (For example, we might remove connections between companies that only share a single director, and then throw away companies that aren’t connected to any other companies. Which is to say, we cut out companies that don’t share two or more directors with any other single company. When you start working with graphs, you begin to realise quite how beautiful, and powerful, a way they are for working data elements that are related to each other in some way.)
  • Here’s an example of the Shell corporate sprawl with the directors removed and edges connecting companies that share two or more directors. The labels are sized relative to the PageRank score of each node, which a measure of how well connected the node is in the graph (the “importance” of each node is dependent on the “importance” of the nodes connected to it….)The lines also provide a background that highlights the connectivity - and structure – of the corporate elements.
  • In this view, I have resized the labels based on the betweenness centrality of each node. This network statistic highlights nodes that play an important role in connecting clusters or groupings of nodes. So for example, we see the suggestion that The Consolidated Petroleum Company and Shell Mex and BP Limited may be the companies that connect the Shell sprawl to the BP one.
  • This is just a tweaking of the layout of the previous graph to try to highlight the separation of the different clusters.
  • Just as we collapsed the network to show how companies could be linked directly by virtue of co-directorships, so we can collapse the network to show how directors are connected.For example, director D1 is connected by a single shared company to directors D2 and D3, whereas D2 and D3 are connected by two companies.
  • Once again, we use line thickness (that is, edge weight) to denote how heavily connected directors are.
  • Here’s a view over connected directors in the the Shell corporate sprawl.
  • As to how we get those graphs plotted? I built a crude workflow in Scraperwiki that gets data out of the scraped database and into a form that allows it to be visualised using the Gephi desktop tool or in a web page using different Javascript libraries (sigma.js or d3.js).
  • This isGephi – a cross-platform desktop tool that’s great for generating effective network visualisations. I have some tutorials and sample datasets if anyone wants to give it a whirl…
  • So where can we take the OpenCorporates data next?I have a couple of ideas: we can go spatial in a geographical sense and start to geocode the registered addresses of companies, to see whether any of them are located in offshore tax havens, for example, or to see whether there are different registered addresses that might lead us to yet more companies (by virtue of sharing common registered office addresses, rather than co-directors, for example); we could start trying to tie non-gb registered companies into the mix. At the moment, director information for other territories is sparse – might them be some other way we can look for connections?
  • Another approach might be to start analysing corporate sprawls in a time dimension. There are several opportunities here: If we have access to company formation and dissolution dates, we can map out a timeiline of a corporate sprawl, which might reveal how companies change name, directorship or association with other companies; if we get all the director information associated with a company, we can visualise how director appointments and terminations occurred across one or more companies, which might in turn reveal identifiable “features” that we might be able to associate with news or business restructuing events; if we track down companies a particular director appears to be associated with, we can start to develop “career timelines” of directors, showing how they have been associated with different corporate groupings over time (and maybe the odd company on the side…)
  • Whilst it is possible to generate insight from the analysis of data that is contained just within OpenCorporates, there are likely to be many opportunities for using OpenCroporates to annotate other datasets, or use external datasets to annotate OpenCorporates data
  • As this example starts to explore, we might try to reconcile company names as recorded in local spending data records with corporate entities identified within in OpenCorporates to build up a better picture of how money flows into corporate sprawls.On a lobbying front, we might look for mentions of meetings between government officials and and company officers, and then try to make mappings between government departments and operational areas of a corporate sprawl, and so on.
  • Transcript

    • 1. OpenCorporatesCo-Director MappingTony HirstDept of Communications and Systems,The Open University
    • 2. As company filings start to appear as open data, opportunitiesmay arise for watchdogs to start mining this data in support oftheir investigations and monitoring activities.This presentation introduces several ideas relating to mappingnetwork structures in order to learn something about thestructure of “corporate sprawls”, corporate groupings definedon the basis of co-director relationships.
    • 3. SocialMediaMappingIntroducing“Graphs”
    • 4. To introduce the idea of a network map, let’s have a look at aview we can construct over the Twitter social space…
    • 5. EmergentSocialPositioning
    • 6. This network maps shows Twitter users who are commonlyfollowed by the followers of @TOGYnewsAlthough hard to see at this scale, the map is actuallyconstructed from labeled points connected by lines (in thejargon, “nodes connected by edges”).The algorithm used to position the labeled nodes tries to placenodes that are heavily connected to each other close to eachother. In a sense, we can view the diagram as a map, withregions that are highlighted using false colours identifyingclusters of nodes that may in some sense be similar to eachother based on the sharing of common followers.
    • 7. ABIs followed byfocusFindthefollowers
    • 8. The map is constructed using data grabbed from the TwitterAPI.Using one or more “focus” users (a specific Twitter account, forexample, or the set of users of a particular hashtag), we grab alist of their followers.
    • 9. AB peerpeerIs followed byfocusFindFriendsofFollowers
    • 10. For each of the followers, we grab a list of their friends (or asample thereof) – that is, a lists of some or all of the peoplethey follow on Twitter.We can use this data to construct a network of people followedby the followers of the original focus.It is typically at this point, where there is most relationalinformation contained within the network, that we lay it outusing automatic layout tools.
    • 11. ABpeerIs followed byfocusFindCommonFriendsofFollowers
    • 12. Drawing on the insight that people on Twitter are likely tofollow accounts that are of interest to them, we can start toimagine the network as a projection of the interests of thepeople who are interested in one or more of the things thefocus is associated with.However, interests of followers may spread to a wide range oftopics, so we look for consistency of interest, pruning thenetwork to remove people who are not commonly followed bythe followers of the focus. That is, we remove nodes who arefollowed by only a few of the followers of the focus.
    • 13. peerfocusFilteroutnotcommonlyfollowed
    • 14. Having laid out the network map, we might now tidy it up alittle by removing all the nodes that are not themselvesfollowed by a significant number of the followers of theoriginal focus,
    • 15. EmergentSocialPositioning
    • 16. The result is a map that shows groups of people positionedaccording to the shared projected presumed interests of theirfollowers.
    • 17. AMorePrincipledApproach
    • 18. It may also be possible to use metadata associated with socialnetworks to develop additional insights.A recent paper describes one way of mining social networkdata for information about people working for a particularcompany, and using public biographical information along withsocial connection data to map out the organisational structuresof large companies.
    • 19. CorporateStructureMapsIntroducing“Graphs”
    • 20. A more principled way of looking at corporate structures at acompany level may possibly be derived from publicly availablecorporate information.
    • 21. C3C1C2D1D3D2Companies&Directors
    • 22. For example, if we can get hold of directorial appointment andtermination data, we can start to construct maps that who howcompanies are connected by common directors, as well aswhich companies are co-directed by particular directors.As with the emergent social positioning network maps, ifparticular directors have particular corporate interests, we maybe able to identify particular organisational groupings incorporate sprawls made up from dozens of operatingcompanies working across a range of business areas.
    • 23. CompanyRecordsonOpenCorporates
    • 24. One possible source of open company information isOpenCorporates.OpenCorporates’ ambitious aim is to mint a unique corporateidentifier for every corporate legal entity in the world [CHECK],as well as collating, and normalising (or “harmonising”)company information about company filings, trademarks,patents(?) and officers (that is company directors, companysecretaries and so on).For GB registered companies, there is a growing repository ofdata relating to company directorships, which provides us withan opportunity to develop maps that show how companies areconnected by virtue of having common directors.
    • 25. SubsidiaryCompanieshave“working”directors
    • 26. Just a note – my experience in looking at data related to GBregistered companies suggests that the directors of the“top”/nominal company in a large multinational grouping are“atypical” compared to the officers appointed to UK basedoperating companies in the same corporate sprawl, beingappointed from the great and the good, or from senior officerswho do not take directorships across operating divisions orcompanies, rather than representing directors of operatingcompanies.When seeding corporate sprawl trawlers – algorithms that tryto identify companies that make up a corporate sprawl basedon co-directorships – my experience suggests that it oftenmakes sense to see the search with one or more operatingcompanies who have directors that are likely to be directors ofother operating companies, rather than the “top level”company.
    • 27. Co-DirectorMappingMoreGraphs
    • 28. We can reuse the ideas that underpin the construction of theemergent social positioning graph to map out corporatestructures based on director information.
    • 29. DirectorRecordsonOpenCorporates
    • 30. As well as corporate information pages, OpenCorporatesmaintains information pages about directorial appointments.At the moment, there are no authority files providingidentifiers that identify the same physical person – eachdirectorial appointment to company provides the director witha unique officer ID. It is possible to search for officers of othercompanies with the same name as a particular director, but noidentifiers that link them as the same physical person. (Thatsaid, there does appear to be a slot in the metadata forauthoritative identifiers.)
    • 31. StartWithOneorMoreSeedCompany
    • 32. So how might we go about constructing a corporate sprawl?Let’s start with one or more seed company.
    • 33. C1D1Has directorD2FindFriendsofFollowers
    • 34. The general shape of this diagram might remind you ofsomething…?For each of the seed companies, we grab a list of theirdirectors.We can use this data to construct a network of people who aredirectors or other officers of the original seed company orcompanies.
    • 35. FindDirectorsofSeedCompany(s)
    • 36. Here’s another way of imagining it – a company surrounded byits directors.
    • 37. C1C2Has directorD2FindFriendsofFollowersD1
    • 38. For each of the directors, we run a search for them onOpenCorporates, to see what directorial appointments havebeen made to other companies for people of exactly the samename.We can use this data to construct a network of companiesdirected by the directors of the original seed company.For those companies that are directed by N or more of thedirectors associated with the seed company or companies(where N is typically 2) we might now say they are part of thecorporate sprawl. The companies sharing fewer than Ndirectors associated with companies admitted to the corporatesprawl are added to a list of possible candidate companies. Aswe find more directors associated with companies included inthe sprawl, we might be able to “legitimise” membership ofthese companies within the sprawl.
    • 39. FindCompaniesWithTwoorMoreSeedDirectors
    • 40. We now have a larger set of companies, reflecting thosecompanies who share N or more directors with the originalseed company or companies.
    • 41. C1C2 D3D1Has directorD2FindFriendsofFollowers
    • 42. If we so decide, we can continue with this snowball discoveryprocess, looking up further directors associated withcompanies we have included in our sprawl, with a view totrying to discover more companies that should be included inthe sprawl.
    • 43. Using this snowball approach, I have constructed a scraper onScraperwiki that mines OpenCorporates, given one or moreseed companies (or seed directors) to map out corporatesprawls, limiting myself to the capture of current directors andactive companies registered in the UK.(The code needs checking and is perhaps not as easy to use asit might be. Developing a more robust and user friendly toolmay be worth exploring if this approach is seen to be useful.)
    • 44. C3C1C2D1D3D2Companies&Directors
    • 45. So – we can generate a network that connects companies withtheir directors, and grow this network out to identifycompanies that share several directors.As with the emergent social positioning map, we can useautomatic layout tools to try to position companies anddirectors close to each other based on their connectivity,producing a map over the corporate sprawl.
    • 46. C3C1C2Companies
    • 47. We can view this network in various ways. For example, wemight choose to view just the companies.
    • 48. PageRank
    • 49. This map shows companies in a corporate sprawl grown outfrom Royal Dutch Shell.Note the presence of BP in there – somehow, these twogroupings are connected by shared directorships of someintermediate company.
    • 50. C3C1C2D1D3D2Companies&Directors
    • 51. One of the nice things about representing this sort of structurein an abstract mathematical or computational way is that wecan wrangle it with code...So for example, companies C1 and C2 are connected by asingle shared director, whereas C2 and C3 are connected bytwo directors.
    • 52. C3C1C2CompaniesSharingDirectors
    • 53. We can represent this by transforming the original bipartite(two types of node) graph that connects directors tocompanies and companies to directors by a graph that justconnects companies who were connected by directors.The thickness of the line (or “edge”) connecting the companiesrepresents its “weight”, which in this case is given by thenumber of shared directors between connected companies.
    • 54. C3C2CompaniesSharingTwoorMoreDirectors
    • 55. We can also filter the graph, for example by adding togetherthe weights of all the edges incident on a node, and throwingaway all nodes for whom this sum is below a specifiedthreshold value.We might alternatively prune the network by removing(“cutting”) all edges below a specified weight, and thenthrowing away nodes that aren’t connected to other nodes.(For example, we might remove connections betweencompanies that only share a single director, and then throwaway companies that aren’t connected to any othercompanies. Which is to say, we cut out companies that don’tshare two or more directors with any other single company.When you start working with graphs, you begin to realise quitehow beautiful, and powerful, a way they are for working dataelements that are related to each other in some way.)
    • 56. PageRank
    • 57. Here’s an example of the Shell corporate sprawl with thedirectors removed and edges connecting companies that sharetwo or more directors. The labels are sized relative to thePageRank score of each node, which a measure of how wellconnected the node is in the graph (the “importance” of eachnode is dependent on the “importance” of the nodesconnected to it….)The lines also provide a background that highlights theconnectivity - and structure – of the corporate elements.
    • 58. Betweenness
    • 59. In this view, I have resized the labels based on thebetweenness centrality of each node. This network statistichighlights nodes that play an important role in connectingclusters or groupings of nodes. So for example, we see thesuggestion that The Consolidated Petroleum Company andShell Mex and BP Limited may be the companies that connectthe Shell sprawl to the BP one.
    • 60. Betweenness(repositioned)
    • 61. This is just a tweaking of the layout of the previous graph to tryto highlight the separation of the different clusters.
    • 62. C3C1C2D1D3D2Companies&Directors
    • 63. Just as we collapsed the network to show how companiescould be linked directly by virtue of co-directorships, so we cancollapse the network to show how directors are connected.For example, director D1 is connected by a single sharedcompany to directors D2 and D3, whereas D2 and D3 areconnected by two companies.
    • 64. D1D3D2Co-Directors
    • 65. Once again, we use line thickness (that is, edge weight) todenote how heavily connected directors are.
    • 66. PageRank
    • 67. Here’s a view over connected directors in the the Shellcorporate sprawl.
    • 68. OpenCorporatesScraperwiki dbJSOND3.jsNetworkxGexfGephi sigma.js
    • 69. As to how we get those graphs plotted? I built a crudeworkflow in Scraperwiki that gets data out of the scrapeddatabase and into a form that allows it to be visualised usingthe Gephi desktop tool or in a web page using differentJavascript libraries (sigma.js or d3.js).
    • 70. This is Gephi – a cross-platform desktop tool that’s great forgenerating effective network visualisations. I have sometutorials and sample datasets if anyone wants to give it awhirl…
    • 71. “Where”Next…?-geocode registered addresses- explore non-gb registered companies
    • 72. So where can we take the OpenCorporates data next?I have a couple of ideas:- we can go spatial in a geographical sense and start togeocode the registered addresses of companies, to seewhether any of them are located in offshore tax havens, forexample, or to see whether there are different registeredaddresses that might lead us to yet more companies (by virtueof sharing common registered office addresses, rather than co-directors, for example);- we could start trying to tie non-gb registered companies intothe mix. At the moment, director information for otherterritories is sparse – might them be some other way we canlook for connections?
    • 73. And“When”?- company timelines (set-up dates, renaming)- explore director timelines (by company)- explore director timelines (by directory)
    • 74. Another approach might be to start analysing corporatesprawls in a time dimension. There are several opportunitieshere:- If we have access to company formation and dissolutiondates, we can map out a timeiline of a corporate sprawl, whichmight reveal how companies change name, directorship orassociation with other companies;- if we get all the director information associated with acompany, we can visualise how director appointments andterminations occurred across one or more companies, whichmight in turn reveal identifiable “features” that we might beable to associate with news or business restructuing events;- if we track down companies a particular director appears tobe associated with, we can start to develop “career timelines”of directors, showing how they have been associated withdifferent corporate groupings over time (and maybe the oddcompany on the side…)
    • 75. Linking outand in- linking companies or directors with externaldatasets
    • 76. Whilst it is possible to generate insight from the analysis ofdata that is contained just within OpenCorporates, there arelikely to be many opportunities for using OpenCroporates toannotate other datasets, or use external datasets to annotateOpenCorporates data
    • 77. SankeyFlowDiagrams
    • 78. As this example starts to explore, we might try to reconcilecompany names as recorded in local spending data recordswith corporate entities identified within in OpenCorporates tobuild up a better picture of how money flows into corporatesprawls.On a lobbying front, we might look for mentions of meetingsbetween government officials and and company officers, andthen try to make mappings between government departmentsand operational areas of a corporate sprawl, and so on.
    • 79. What doyou think?
    • 80. [ This is part of an ongoing informal exploration of the patternsand structures we can find across large open datasets.For more information, follow:- blog.ouseful.info- @psychemediaAll comments welcome. ]

    ×