The closed world of
company data
         And why we need to open it up
We live in a corporate world


   ...but much of it is invisible to us
   (citizens, journalists, NGOs, regulators,
   governments, SMEs, etc...)
Companies no
longer look like this
Companies no
longer look like this
Companies no
longer look like this




                        nor even like this
But
like
this
But
like
this




       or like this
But
like
this




       or like this
And they are more complex
by the day
And they are more complex
by the day
Growing in scale – not 10s of legal entities but 1000s
And they are more complex
by the day
Growing in scale – not 10s of legal entities but 1000s
Growing in speed – we are seeing the beginnings of
high-frequency company formation
And they are more complex
by the day
Growing in scale – not 10s of legal entities but 1000s
Growing in speed – we are seeing the beginnings of
high-frequency company formation
Growing in opacity – use of secrecy jurisdictions and
off-register entities to provide firewalls to tax, regulation,
information
And they are more complex
by the day
Growing in scale – not 10s of legal entities but 1000s
Growing in speed – we are seeing the beginnings of
high-frequency company formation
Growing in opacity – use of secrecy jurisdictions and
off-register entities to provide firewalls to tax, regulation,
information
Growing in complexity – not a hierarchy but a complex,
sometimes even circular network of entities
Getting the data matters


 In the 21st century, data is power
 We’ve always been governed by data, now our lives
 are data
 Huge asymmetry of access to public data
 Sold and collected to enrich global proprietary
 databases, denied to citizens
Information is the currency
of democracy
                    Thomas Jefferson
ATA is the currency
Information
    D
of democracy
                  Thomas Jefferson
This matters
No understanding = no control
This matters
No understanding = no control
 Leads to systematic problems – Lehman’s, pollution
 exporting, market failures, etc
This matters
No understanding = no control
 Leads to systematic problems – Lehman’s, pollution
 exporting, market failures, etc
 Reduces accountability and corporate governance.
 Enables and encourages companies to behave like bad
 corporate citizens
This matters
No understanding = no control
 Leads to systematic problems – Lehman’s, pollution
 exporting, market failures, etc
 Reduces accountability and corporate governance.
 Enables and encourages companies to behave like bad
 corporate citizens
 Enables of money laundering, organised crime and
 corruption (see World Bank Puppet Masters report)
This matters
No understanding = no control
 Leads to systematic problems – Lehman’s, pollution
 exporting, market failures, etc
 Reduces accountability and corporate governance.
 Enables and encourages companies to behave like bad
 corporate citizens
 Enables of money laundering, organised crime and
 corruption (see World Bank Puppet Masters report)
 Remember, companies are artificial entities given legal
 personality by the state for the good of society
...as recognised by the Open
Government Partnership
5 Grand Goals
1.Improving Public Services
2.Increasing Public Integrity
3.More Effectively Managing Public Resources
4.Creating Safer Communities
5.Increasing Corporate Accountability 
So how do the OGP
countries score for access
   to company data?
So how do the OGP
countries score for access
           FA
   to company IL! data?
4 key measures
4 key measures
Basic search: can you search the company register
freely, without charge and without registration
Licence: Is there a licence that allows open reuse of
the information
Data: Is the information available as open data as a
data dump or an API
Depth: Is there sufficient information to get a true
picture of the company and those who control it –
directors, significant shareholdings, and statutory filings
The results were not good...
The results were not good...
For US we took a straight average of the
state registers (possibly overstates access)
On corporate confidentiality
& competitive advantage
On corporate confidentiality
& competitive advantage
 No good reason why a corporate hierarchy should not
 be public
On corporate confidentiality
& competitive advantage
 No good reason why a corporate hierarchy should not
 be public
 Competitive advantage should be about new products
 and services, innovation, risking capital, not devising
 complex corporate networks that encourage
 companies to evade regulation, tax, scrutiny
On corporate confidentiality
& competitive advantage
 No good reason why a corporate hierarchy should not
 be public
 Competitive advantage should be about new products
 and services, innovation, risking capital, not devising
 complex corporate networks that encourage
 companies to evade regulation, tax, scrutiny
 Disproportionally benefits big incumbents, thus stifling
 competition and innovation
On corporate confidentiality
& competitive advantage
 No good reason why a corporate hierarchy should not
 be public
 Competitive advantage should be about new products
 and services, innovation, risking capital, not devising
 complex corporate networks that encourage
 companies to evade regulation, tax, scrutiny
 Disproportionally benefits big incumbents, thus stifling
 competition and innovation
 Disadvantages those companies that want to be good
 corporate citizens, forcing a race to the bottom
What is OpenCorporates?
A simple (huge)
goal: build an
openly licensed
database with
an entry (and
URI) for every
corporate legal
entity in the
world
What is OpenCorporates?
A simple (huge)
goal: build an
openly licensed
database with
an entry (and
URI) for every
                                                      dict ions
corporate legal
                                      n 52      juris       ates
entity in the
                               anies i             22 US st
world                n co   mp          clud   ing
            0m illio                 In
  w ov er 4
No
5 core uses
1. An open identifying system
1. An open identifying system
 URIs can be used as common identifiers among a
 variety of organisations
 Can be used without reference to OpenCorporates
 Because they map to the id issued by the company
 register the corresponding entry in the registry (and
 associated info) can be found, and vice versa
 Fits the new W3c/EU Business Vocabulary
 Can even by used for companies in jurisdiction we
 haven’t yet imported
2. The simple search
2. The simple search

Not to be underestimated
2. The simple search

Not to be underestimated
2. The simple search

Not to be underestimated
Massively reduces friction
(how long will it take you
to find and search
multiple jurisdictions)
2. The simple search

Not to be underestimated
Massively reduces friction
(how long will it take you
to find and search
multiple jurisdictions)
2. The simple search

Not to be underestimated
Massively reduces friction
(how long will it take you
to find and search
multiple jurisdictions)
Allows what if questions
2. The simple search

Not to be underestimated
Massively reduces friction
(how long will it take you
to find and search
multiple jurisdictions)
Allows what if questions
Potentially generates
stories in its own right
2. The simple search

Not to be underestimated
Massively reduces friction
(how long will it take you
to find and search
multiple jurisdictions)
Allows what if questions
Potentially generates
stories in its own right
3. Source for additional info
3. Source for additional info
 Addresses, filings,
 status, websites...
3. Source for additional info
 Addresses, filings,
 status, websites...
3. Source for additional info
 Addresses, filings,
 status, websites...
 Intl trademarks, UK
 govt spending, official
 notices, health & safety
 violations...
3. Source for additional info
 Addresses, filings,
 status, websites...
 Intl trademarks, UK
 govt spending, official
 notices, health & safety
 violations...
3. Source for additional info
 Addresses, filings,
 status, websites...
 Intl trademarks, UK
 govt spending, official
 notices, health & safety
 violations...
 Other IDs: SEC, CAGE,
 etc – allows reverse
 mapping queries, e.g.
 show me legal entity
 mapped to a CIK code
4. Reconciliation
(matching names to legal
Clean up messy
company names
(& prev names)
to legal entity,
and from there
to other data
Google Refine
reconciliation
service (specific
to jurisdiction)
5. The platform

 API: allows all
 information to be
 retrieved as data,
 even searches
 Users can now
 add data too
 Coming soon: the
 option to match
 data to
 companies
5. The platform

 API: allows all
 information to be
 retrieved as data,
 even searches
 Users can now
 add data too
 Coming soon: the
 option to match
 data to
 companies
5. The platform

 API: allows all
 information to be
 retrieved as data,
 even searches
 Users can now
 add data too
 Coming soon: the
 option to match
 data to
 companies
5. The platform

 API: allows all
 information to be
 retrieved as data,
 even searches
 Users can now
 add data too
 Coming soon: the
 option to match
 data to
 companies
5. The platform

 API: allows all
 information to be
 retrieved as data,
 even searches
 Users can now
 add data too
 Coming soon: the
 option to match
 data to
 companies
How have we done it?
How have we done it?

Co-operation – we get data direct from some company
registers (UK, NZ, a few US), and are working with
international institutions (EC, W3c, Financial Stability
Board, etc) to improve visibility and reuse of company info
How have we done it?

Co-operation – we get data direct from some company
registers (UK, NZ, a few US), and are working with
international institutions (EC, W3c, Financial Stability
Board, etc) to improve visibility and reuse of company info
Community – a lot of the data has been contributed by
the open data community (thanks, ScraperWiki)
How have we done it?

Co-operation – we get data direct from some company
registers (UK, NZ, a few US), and are working with
international institutions (EC, W3c, Financial Stability
Board, etc) to improve visibility and reuse of company info
Community – a lot of the data has been contributed by
the open data community (thanks, ScraperWiki)
Cool open-source software (100% open source
platform/tools)
How have we done it?

Co-operation – we get data direct from some company
registers (UK, NZ, a few US), and are working with
international institutions (EC, W3c, Financial Stability
Board, etc) to improve visibility and reuse of company info
Community – a lot of the data has been contributed by
the open data community (thanks, ScraperWiki)
Cool open-source software (100% open source
platform/tools)
Colossal scraping (100,000s of pages/API calls per day)
Problems (& solutions)
Problems (& solutions)

 Company registers consider themselves businesses,
 not public registers – sometimes block access
Problems (& solutions)

 Company registers consider themselves businesses,
 not public registers – sometimes block access
 Slow, poorly designed company register websites (and
 sometimes they don’t even exist – and not just in
 developing countries)
Problems (& solutions)

 Company registers consider themselves businesses,
 not public registers – sometimes block access
 Slow, poorly designed company register websites (and
 sometimes they don’t even exist – and not just in
 developing countries)
 Understanding global data
Problems (& solutions)

 Company registers consider themselves businesses,
 not public registers – sometimes block access
 Slow, poorly designed company register websites (and
 sometimes they don’t even exist – and not just in
 developing countries)
 Understanding global data
 International/national jurisdictions
Problems (& solutions)

 Company registers consider themselves businesses,
 not public registers – sometimes block access
 Slow, poorly designed company register websites (and
 sometimes they don’t even exist – and not just in
 developing countries)
 Understanding global data
 International/national jurisdictions
 Big-data problems – ETL, scaling, etc
Problems (& solutions)

  Company registers consider themselves businesses,
  not public registers – sometimes block access
   Slow, poorly designed company register websites h elp
                                               to (and
                                 ow     ant in
   developing countries) le
                    op      wh
   sometimes they don’t even exist – and not just

               pe
        ing global data
F   nd
  iUnderstanding
  International/national jurisdictions
  Big-data problems – ETL, scaling, etc
What next?
What next?



Recently started adding company directors and officers
What next?



Recently started adding company directors and officers
More public data – political donations, lobbyists, other
ID systems
What next?



Recently started adding company directors and officers
More public data – political donations, lobbyists, other
ID systems
Relationships between corporate entities
What next?



Recently started adding company directors and officers
More public data – political donations, lobbyists, other
ID systems
Relationships between corporate entities
More options for community to add/curate data

The Closed World Of Company Data

  • 1.
    The closed worldof company data And why we need to open it up
  • 2.
    We live ina corporate world ...but much of it is invisible to us (citizens, journalists, NGOs, regulators, governments, SMEs, etc...)
  • 5.
  • 6.
  • 7.
    Companies no longer looklike this nor even like this
  • 9.
  • 10.
    But like this or like this
  • 11.
    But like this or like this
  • 12.
    And they aremore complex by the day
  • 13.
    And they aremore complex by the day Growing in scale – not 10s of legal entities but 1000s
  • 14.
    And they aremore complex by the day Growing in scale – not 10s of legal entities but 1000s Growing in speed – we are seeing the beginnings of high-frequency company formation
  • 15.
    And they aremore complex by the day Growing in scale – not 10s of legal entities but 1000s Growing in speed – we are seeing the beginnings of high-frequency company formation Growing in opacity – use of secrecy jurisdictions and off-register entities to provide firewalls to tax, regulation, information
  • 16.
    And they aremore complex by the day Growing in scale – not 10s of legal entities but 1000s Growing in speed – we are seeing the beginnings of high-frequency company formation Growing in opacity – use of secrecy jurisdictions and off-register entities to provide firewalls to tax, regulation, information Growing in complexity – not a hierarchy but a complex, sometimes even circular network of entities
  • 17.
    Getting the datamatters In the 21st century, data is power We’ve always been governed by data, now our lives are data Huge asymmetry of access to public data Sold and collected to enrich global proprietary databases, denied to citizens
  • 18.
    Information is thecurrency of democracy Thomas Jefferson
  • 19.
    ATA is thecurrency Information D of democracy Thomas Jefferson
  • 20.
  • 21.
    This matters No understanding= no control Leads to systematic problems – Lehman’s, pollution exporting, market failures, etc
  • 22.
    This matters No understanding= no control Leads to systematic problems – Lehman’s, pollution exporting, market failures, etc Reduces accountability and corporate governance. Enables and encourages companies to behave like bad corporate citizens
  • 23.
    This matters No understanding= no control Leads to systematic problems – Lehman’s, pollution exporting, market failures, etc Reduces accountability and corporate governance. Enables and encourages companies to behave like bad corporate citizens Enables of money laundering, organised crime and corruption (see World Bank Puppet Masters report)
  • 24.
    This matters No understanding= no control Leads to systematic problems – Lehman’s, pollution exporting, market failures, etc Reduces accountability and corporate governance. Enables and encourages companies to behave like bad corporate citizens Enables of money laundering, organised crime and corruption (see World Bank Puppet Masters report) Remember, companies are artificial entities given legal personality by the state for the good of society
  • 25.
    ...as recognised bythe Open Government Partnership 5 Grand Goals 1.Improving Public Services 2.Increasing Public Integrity 3.More Effectively Managing Public Resources 4.Creating Safer Communities 5.Increasing Corporate Accountability 
  • 26.
    So how dothe OGP countries score for access to company data?
  • 27.
    So how dothe OGP countries score for access FA to company IL! data?
  • 28.
  • 29.
    4 key measures Basicsearch: can you search the company register freely, without charge and without registration Licence: Is there a licence that allows open reuse of the information Data: Is the information available as open data as a data dump or an API Depth: Is there sufficient information to get a true picture of the company and those who control it – directors, significant shareholdings, and statutory filings
  • 30.
    The results werenot good...
  • 31.
    The results werenot good...
  • 32.
    For US wetook a straight average of the state registers (possibly overstates access)
  • 33.
    On corporate confidentiality &competitive advantage
  • 34.
    On corporate confidentiality &competitive advantage No good reason why a corporate hierarchy should not be public
  • 35.
    On corporate confidentiality &competitive advantage No good reason why a corporate hierarchy should not be public Competitive advantage should be about new products and services, innovation, risking capital, not devising complex corporate networks that encourage companies to evade regulation, tax, scrutiny
  • 36.
    On corporate confidentiality &competitive advantage No good reason why a corporate hierarchy should not be public Competitive advantage should be about new products and services, innovation, risking capital, not devising complex corporate networks that encourage companies to evade regulation, tax, scrutiny Disproportionally benefits big incumbents, thus stifling competition and innovation
  • 37.
    On corporate confidentiality &competitive advantage No good reason why a corporate hierarchy should not be public Competitive advantage should be about new products and services, innovation, risking capital, not devising complex corporate networks that encourage companies to evade regulation, tax, scrutiny Disproportionally benefits big incumbents, thus stifling competition and innovation Disadvantages those companies that want to be good corporate citizens, forcing a race to the bottom
  • 38.
    What is OpenCorporates? Asimple (huge) goal: build an openly licensed database with an entry (and URI) for every corporate legal entity in the world
  • 39.
    What is OpenCorporates? Asimple (huge) goal: build an openly licensed database with an entry (and URI) for every dict ions corporate legal n 52 juris ates entity in the anies i 22 US st world n co mp clud ing 0m illio In w ov er 4 No
  • 44.
  • 45.
    1. An openidentifying system
  • 46.
    1. An openidentifying system URIs can be used as common identifiers among a variety of organisations Can be used without reference to OpenCorporates Because they map to the id issued by the company register the corresponding entry in the registry (and associated info) can be found, and vice versa Fits the new W3c/EU Business Vocabulary Can even by used for companies in jurisdiction we haven’t yet imported
  • 47.
  • 48.
    2. The simplesearch Not to be underestimated
  • 49.
    2. The simplesearch Not to be underestimated
  • 50.
    2. The simplesearch Not to be underestimated Massively reduces friction (how long will it take you to find and search multiple jurisdictions)
  • 51.
    2. The simplesearch Not to be underestimated Massively reduces friction (how long will it take you to find and search multiple jurisdictions)
  • 52.
    2. The simplesearch Not to be underestimated Massively reduces friction (how long will it take you to find and search multiple jurisdictions) Allows what if questions
  • 53.
    2. The simplesearch Not to be underestimated Massively reduces friction (how long will it take you to find and search multiple jurisdictions) Allows what if questions Potentially generates stories in its own right
  • 54.
    2. The simplesearch Not to be underestimated Massively reduces friction (how long will it take you to find and search multiple jurisdictions) Allows what if questions Potentially generates stories in its own right
  • 55.
    3. Source foradditional info
  • 56.
    3. Source foradditional info Addresses, filings, status, websites...
  • 57.
    3. Source foradditional info Addresses, filings, status, websites...
  • 58.
    3. Source foradditional info Addresses, filings, status, websites... Intl trademarks, UK govt spending, official notices, health & safety violations...
  • 59.
    3. Source foradditional info Addresses, filings, status, websites... Intl trademarks, UK govt spending, official notices, health & safety violations...
  • 60.
    3. Source foradditional info Addresses, filings, status, websites... Intl trademarks, UK govt spending, official notices, health & safety violations... Other IDs: SEC, CAGE, etc – allows reverse mapping queries, e.g. show me legal entity mapped to a CIK code
  • 61.
    4. Reconciliation (matching namesto legal Clean up messy company names (& prev names) to legal entity, and from there to other data Google Refine reconciliation service (specific to jurisdiction)
  • 62.
    5. The platform API: allows all information to be retrieved as data, even searches Users can now add data too Coming soon: the option to match data to companies
  • 63.
    5. The platform API: allows all information to be retrieved as data, even searches Users can now add data too Coming soon: the option to match data to companies
  • 64.
    5. The platform API: allows all information to be retrieved as data, even searches Users can now add data too Coming soon: the option to match data to companies
  • 65.
    5. The platform API: allows all information to be retrieved as data, even searches Users can now add data too Coming soon: the option to match data to companies
  • 66.
    5. The platform API: allows all information to be retrieved as data, even searches Users can now add data too Coming soon: the option to match data to companies
  • 67.
    How have wedone it?
  • 68.
    How have wedone it? Co-operation – we get data direct from some company registers (UK, NZ, a few US), and are working with international institutions (EC, W3c, Financial Stability Board, etc) to improve visibility and reuse of company info
  • 69.
    How have wedone it? Co-operation – we get data direct from some company registers (UK, NZ, a few US), and are working with international institutions (EC, W3c, Financial Stability Board, etc) to improve visibility and reuse of company info Community – a lot of the data has been contributed by the open data community (thanks, ScraperWiki)
  • 70.
    How have wedone it? Co-operation – we get data direct from some company registers (UK, NZ, a few US), and are working with international institutions (EC, W3c, Financial Stability Board, etc) to improve visibility and reuse of company info Community – a lot of the data has been contributed by the open data community (thanks, ScraperWiki) Cool open-source software (100% open source platform/tools)
  • 71.
    How have wedone it? Co-operation – we get data direct from some company registers (UK, NZ, a few US), and are working with international institutions (EC, W3c, Financial Stability Board, etc) to improve visibility and reuse of company info Community – a lot of the data has been contributed by the open data community (thanks, ScraperWiki) Cool open-source software (100% open source platform/tools) Colossal scraping (100,000s of pages/API calls per day)
  • 72.
  • 73.
    Problems (& solutions) Company registers consider themselves businesses, not public registers – sometimes block access
  • 74.
    Problems (& solutions) Company registers consider themselves businesses, not public registers – sometimes block access Slow, poorly designed company register websites (and sometimes they don’t even exist – and not just in developing countries)
  • 75.
    Problems (& solutions) Company registers consider themselves businesses, not public registers – sometimes block access Slow, poorly designed company register websites (and sometimes they don’t even exist – and not just in developing countries) Understanding global data
  • 76.
    Problems (& solutions) Company registers consider themselves businesses, not public registers – sometimes block access Slow, poorly designed company register websites (and sometimes they don’t even exist – and not just in developing countries) Understanding global data International/national jurisdictions
  • 77.
    Problems (& solutions) Company registers consider themselves businesses, not public registers – sometimes block access Slow, poorly designed company register websites (and sometimes they don’t even exist – and not just in developing countries) Understanding global data International/national jurisdictions Big-data problems – ETL, scaling, etc
  • 78.
    Problems (& solutions) Company registers consider themselves businesses, not public registers – sometimes block access Slow, poorly designed company register websites h elp to (and ow ant in developing countries) le op wh sometimes they don’t even exist – and not just pe ing global data F nd iUnderstanding International/national jurisdictions Big-data problems – ETL, scaling, etc
  • 79.
  • 80.
    What next? Recently startedadding company directors and officers
  • 81.
    What next? Recently startedadding company directors and officers More public data – political donations, lobbyists, other ID systems
  • 82.
    What next? Recently startedadding company directors and officers More public data – political donations, lobbyists, other ID systems Relationships between corporate entities
  • 83.
    What next? Recently startedadding company directors and officers More public data – political donations, lobbyists, other ID systems Relationships between corporate entities More options for community to add/curate data