Open Corporate Data: not just good, better

  • 3,334 views
Uploaded on

Presentation given by Chris Taggart, CEO and Co-Founder of OpenCorporates at Open Knowledge Festival, Geneva, September 2013 …

Presentation given by Chris Taggart, CEO and Co-Founder of OpenCorporates at Open Knowledge Festival, Geneva, September 2013

Discussing benefits and quality of open corporate hierarchy (network) data

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,334
On Slideshare
0
From Embeds
0
Number of Embeds
9

Actions

Shares
Downloads
23
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Open Data Not Just Good. Better
  • 2. Open Data is Good! http://www.flickr.com/photos/stolidsoul/433129708/sizes/o/in/photostream/
  • 3. But we’re not the ones we need to convince http://okfestival.org/open-government-data-camp/
  • 4. Most people don’t care about ‘open’ http://www.flickr.com/photos/erlin1/9312646298/sizes/l/in/photostream/
  • 5. Even though open data is better (than closed/proprietary)
  • 6. Even though open data is better (than closed/proprietary) • Better for innovation
  • 7. Even though open data is better (than closed/proprietary) • Better for innovation • Better for competition
  • 8. Even though open data is better (than closed/proprietary) • Better for innovation • Better for competition • Better for efficiency
  • 9. Even though open data is better (than closed/proprietary) • Better for innovation • Better for competition • Better for efficiency • Better for sharing (esp cross- organisation or cross-border)
  • 10. But open has a secret weapon http://www.flickr.com/photos/x-ray_delta_one/8493335701/sizes/l/in/photostream/
  • 11. It’s better quality too http://www.flickr.com/photos/infusionsoft/4484373179/sizes/l/in/photostream/
  • 12. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  • 13. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  • 14. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  • 15. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  • 16. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  • 17. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  • 18. Problem Cause Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying Gaps in data High (& often duplicated) cost of data entry. Limited to payers Lack of granularity Legacy systems/data models hard to reengineer in closed world Errors go uncorrected Few feedback mechanisms Black box/No provenance Can’t reveal (sometimes dubious) sources. Limits usefulness/trust Isolated Proprietary IDs are internal identifiers & are barriers to sharing & improved data quality Common proprietary data quality issues
  • 19. A concrete example: corporate networks
  • 20. Hugely important (and valuable) • The dataset we need to understand the corporate world • Who we (or the government) is really doing business with • Political influence/donations/lobbying • Tax/resource extraction • Corporate Governance • Credit risk
  • 21. But proprietary datasets on this are problematic • Expensive, so relatively few users • Huge gaps in data • Uses proprietary IDs (so not clear what it’s refers to) • Restrictive licences • Opaque – no info re calculations, provenance or confidence
  • 22. But proprietary datasets on this are problematic • Expensive, so relatively few users • Huge gaps in data • Uses proprietary IDs (so not clear what it’s refers to) • Restrictive licences • Opaque – no info re calculations, provenance or confidence Result: low-quality data
  • 23. The open data alternative
  • 24. The open data alternative Enabled by a grant from the Alfred P Sloan Foundation
  • 25. Data from disparate public sources
  • 26. finding new insights
  • 27. no such company
  • 28. ...and errorstoo no such company
  • 29. What a modern financial company looks like (highly simplified & truncated views)
  • 30. What a modern financial company looks like (highly simplified & truncated views)
  • 31. What a modern financial company looks like (highly simplified & truncated views)
  • 32. What a modern financial company looks like (highly simplified & truncated views) private unlimited company
  • 33. Crowd-sourcing?
  • 34. Ninja-sourcing! http://www.flickr.com/photos/danielygo/5531024732/sizes/l/in/photostream/
  • 35. The company that wants to know your network... every friend... every interaction http://www.flickr.com/photos/jeffmcneill/5260815552/sizes/l/ why bother?
  • 36. Facebook, Inc This is what we got from their SEC filings as text
  • 37. Facebook, Inc (and turned into data) This is what we got from their SEC filings as text
  • 38. Facebook, Inc Pinnacle Sweden AB Vitesse LLC Facebook Operations LLC Facebook Ireland Limited Edge Network Services Limited Andale Acquisition Corp (and turned into data) This is what we got from their SEC filings as text
  • 39. Facebook Ireland Limited Edge Network Services Limited Pinnacle Sweden AB Vitesse LLC Facebook Operations LLC Andale Acquisition Corp Then we started investigating Facebook, Inc
  • 40. Facebook Ireland Limited Edge Network Services Limited Then we started investigating Facebook, Inc
  • 41. Facebook, Inc Facebook Ireland Limited Edge Network Services Limited
  • 42. Facebook, Inc Facebook Ireland Limited Edge Network Services Limited Facebook Cayman Holdings Unlimited IV Facebook Cayman Holdings Unlimited II Facebook Cayman Holdings Unlimited lll Facebook Ireland Holdings Randomus Investments Limited Facebook International Holdings II Ltd Facebook International Holdings I Ltd Facebook Cayman Holdings Unlimited I
  • 43. Want to help? jobs@opencorporates.com investigators@opencorporates.com