Open Corporate Data: not just good, better

4,867 views

Published on

Presentation given by Chris Taggart, CEO and Co-Founder of OpenCorporates at Open Knowledge Festival, Geneva, September 2013

Discussing benefits and quality of open corporate hierarchy (network) data

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,867
On SlideShare
0
From Embeds
0
Number of Embeds
1,769
Actions
Shares
0
Downloads
27
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Open Corporate Data: not just good, better

  1. 1. Open Data Not Just Good. Better
  2. 2. Open Data is Good! http://www.flickr.com/photos/stolidsoul/433129708/sizes/o/in/photostream/
  3. 3. But we’re not the ones we need to convince http://okfestival.org/open-government-data-camp/
  4. 4. Most people don’t care about ‘open’ http://www.flickr.com/photos/erlin1/9312646298/sizes/l/in/photostream/
  5. 5. Even though open data is better (than closed/proprietary)
  6. 6. Even though open data is better (than closed/proprietary) • Better for innovation
  7. 7. Even though open data is better (than closed/proprietary) • Better for innovation • Better for competition
  8. 8. Even though open data is better (than closed/proprietary) • Better for innovation • Better for competition • Better for efficiency
  9. 9. Even though open data is better (than closed/proprietary) • Better for innovation • Better for competition • Better for efficiency • Better for sharing (esp cross- organisation or cross-border)
  10. 10. But open has a secret weapon http://www.flickr.com/photos/x-ray_delta_one/8493335701/sizes/l/in/photostream/
  11. 11. It’s better quality too http://www.flickr.com/photos/infusionsoft/4484373179/sizes/l/in/photostream/
  12. 12. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  13. 13. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  14. 14. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  15. 15. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  16. 16. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  17. 17. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  18. 18. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  19. 19. A concrete example: corporate networks
  20. 20. Hugely important (and valuable) • The dataset we need to understand the corporate world • Who we (or the government) is really doing business with • Political influence/donations/lobbying • Tax/resource extraction • Corporate Governance • Credit risk
  21. 21. But proprietary datasets on this are problematic • Expensive, so relatively few users • Huge gaps in data • Uses proprietary IDs (so not clear what it’s refers to) • Restrictive licences • Opaque – no info re calculations, provenance or confidence
  22. 22. But proprietary datasets on this are problematic • Expensive, so relatively few users • Huge gaps in data • Uses proprietary IDs (so not clear what it’s refers to) • Restrictive licences • Opaque – no info re calculations, provenance or confidence Result: low-quality data
  23. 23. The open data alternative
  24. 24. The open data alternative Enabled by a grant from the Alfred P Sloan Foundation
  25. 25. Data from disparate public sources
  26. 26. finding new insights
  27. 27. no such company
  28. 28. ...and errorstoo no such company
  29. 29. What a modern financial company looks like (highly simplified & truncated views)
  30. 30. What a modern financial company looks like (highly simplified & truncated views)
  31. 31. What a modern financial company looks like (highly simplified & truncated views)
  32. 32. What a modern financial company looks like (highly simplified & truncated views) private unlimited company
  33. 33. Crowd-sourcing?
  34. 34. Ninja-sourcing! http://www.flickr.com/photos/danielygo/5531024732/sizes/l/in/photostream/
  35. 35. The company that wants to know your network... every friend... every interaction http://www.flickr.com/photos/jeffmcneill/5260815552/sizes/l/ why bother?
  36. 36. Facebook, Inc This is what we got from their SEC filings as text
  37. 37. Facebook, Inc (and turned into data) This is what we got from their SEC filings as text
  38. 38. Facebook, Inc Pinnacle Sweden AB Vitesse LLC Facebook Operations LLC Facebook Ireland Limited Edge Network Services Limited Andale Acquisition Corp (and turned into data) This is what we got from their SEC filings as text
  39. 39. Facebook Ireland Limited Edge Network Services Limited Pinnacle Sweden AB Vitesse LLC Facebook Operations LLC Andale Acquisition Corp Then we started investigating Facebook, Inc
  40. 40. Facebook Ireland Limited Edge Network Services Limited Then we started investigating Facebook, Inc
  41. 41. Facebook, Inc Facebook Ireland Limited Edge Network Services Limited
  42. 42. Facebook, Inc Facebook Ireland Limited Edge Network Services Limited Facebook Cayman Holdings Unlimited IV Facebook Cayman Holdings Unlimited II Facebook Cayman Holdings Unlimited lll Facebook Ireland Holdings Randomus Investments Limited Facebook International Holdings II Ltd Facebook International Holdings I Ltd Facebook Cayman Holdings Unlimited I
  43. 43. Want to help? jobs@opencorporates.com investigators@opencorporates.com

×