SlideShare a Scribd company logo
1 of 43
Open data for journalists
         How it’s useful, why it matters




             Chris Taggart, OpenCorporates, NICAR, Feb 2012
About OpenCorporates
About OpenCorporates

                                            tions
                                       isdic tes
                                  7 jur
                           nies in 4 22 US sta
                     compa     clud ing
            6million         In
   wov er 3
No
A simple (but huge) goal: an
entry for every corporate
legal entity in the world
Based on the company number and jurisdiction
(no monopoly id)
A simple (but huge) goal: an
entry for every corporate
legal entity in the world
Based on the company number and jurisdiction
(no monopoly id)
A simple (but huge) goal: an
entry for every corporate
legal entity in the world
Based on the company number and jurisdiction
(no monopoly id)
The simple search
The simple search

Not to be underestimated
The simple search

Not to be underestimated
The simple search

Not to be underestimated
Massively reduces friction
(how long will it take you
to find and search
multiple jurisdictions)
The simple search

Not to be underestimated
Massively reduces friction
(how long will it take you
to find and search
multiple jurisdictions)
The simple search

Not to be underestimated
Massively reduces friction
(how long will it take you
to find and search
multiple jurisdictions)
Allows what if questions
The simple search

Not to be underestimated
Massively reduces friction
(how long will it take you
to find and search
multiple jurisdictions)
Allows what if questions
Potentially generates
stories in its own right
The simple search

Not to be underestimated
Massively reduces friction
(how long will it take you
to find and search
multiple jurisdictions)
Allows what if questions
Potentially generates
stories in its own right
Source for additional info
Source for additional info

 Addresses, filings,
 status, websites...
Source for additional info

 Addresses, filings,
 status, websites...
Source for additional info

 Addresses, filings,
 status, websites...
 Intl trademarks, UK
 govt spending,
 official notices,
 health & safety...
Source for additional info

 Addresses, filings,
 status, websites...
 Intl trademarks, UK
 govt spending,
 official notices,
 health & safety...
Source for additional info

 Addresses, filings,
 status, websites...
 Intl trademarks, UK
 govt spending,
 official notices,
 health & safety...
 Other IDs: SEC,
 CAGE, charity....
Source for additional info

 Addresses, filings,
 status, websites...
 Intl trademarks, UK
 govt spending,
 official notices,
 health & safety...
 Other IDs: SEC,
 CAGE, charity....
 Coming soon:
 lobbying registers
Reconciliation
(matching names to legal entities)


Cleans up
messy
company
names (&
previous
names) to
legal entity,
and from there
to other data
The database/platform




API: allows all
information to be
retrieved as data,
even searches
Why care about open data?
the freedom argument
Information is the currency
          of democracy
                                                       Thomas Jefferson*




* This quote has also been attributed to Ralph Nader
ATA is the currency
          Information
              D
          of democracy
                                                       Thomas Jefferson*




* This quote has also been attributed to Ralph Nader
ATA is the currency
          Information
              D
          of democracy
                                                       Thomas Jefferson*

                We live in a big data world. Our lives are
                not just governed by data, they are data




* This quote has also been attributed to Ralph Nader
ATA is the currency
          Information
              D
          of democracy
                                                       Thomas Jefferson*

                We live in a big data world. Our lives are
                not just governed by data, they are data
                The biggest databases in the world are
                private not governmental – social
                networks, search engines, finance
                companies, supermarkets...



* This quote has also been attributed to Ralph Nader
ATA is the currency
          Information
              D
          of democracy
                                                       Thomas Jefferson*

                We live in a big data world. Our lives are
                not just governed by data, they are data
                The biggest databases in the world are
                private not governmental – social
                networks, search engines, finance
                companies, supermarkets...
                These are enriched by public data that
                are only available to purchase
* This quote has also been attributed to Ralph Nader
ATA is the currency
          Information
              D
          of democracy
                                                       Thomas Jefferson*

                We live in a big data world. Our lives are
                not just governed by data, they are data
                The biggest databases in the world are
                private not governmental – social
                networks, search engines, finance
                companies, supermarkets...
                These are enriched by public data that       Getting worse
                are only available to purchase                  in USA
* This quote has also been attributed to Ralph Nader
the journalism argument
Good journalism = data
journalism
Good journalism = data
journalism
But the data is complex
Good journalism = data
journalism
But the data is complex
Split across multiple datasets
Good journalism = data
journalism
But the data is complex
Split across multiple datasets
The links are not clear and often redacted
Good journalism = data
journalism
But the data is complex
Split across multiple datasets
The links are not clear and often redacted
And without both access and the right to reuse the
data, you can’t even begin to make sense of it
Good journalism = data
journalism
But the data is complex
Split across multiple datasets
The links are not clear and often redacted
And without both access and the right to reuse the
data, you can’t even begin to make sense of it
Even then, it’s HARD
Good journalism = data
journalism
But the data is complex
Split across multiple datasets
The links are not clear and often redacted
And without both access and the right to reuse the
data, you can’t even begin to make sense of it
Even then, it’s HARD
But that’s why it’s worth doing
In short, this only works:
In short, this only works:
     SELECT companies.* FROM companies
      INNER JOIN government_suppliers ON
companies.id = government_suppliers.company_id
    INNER JOIN directors ON companies.id =
         directors.company_id WHERE
 government_suppliers.total_received > 5000000
   AND directors.convicted_tax_evader = true
In short, this only works:
     SELECT companies.* FROM companies
      INNER JOIN government_suppliers ON
companies.id = government_suppliers.company_id
    INNER JOIN directors ON companies.id =
         directors.company_id WHERE
 government_suppliers.total_received > 5000000
   AND directors.convicted_tax_evader = true

     if you can get the data

More Related Content

Similar to Open Data For Journalists : How it works, why it matters

Big Data in the Legal Industry
Big Data in the Legal IndustryBig Data in the Legal Industry
Big Data in the Legal IndustryEvolve Law
 
Big Data, Republicans and 2016
Big Data, Republicans and 2016Big Data, Republicans and 2016
Big Data, Republicans and 2016steveparkhurst
 
Big data introduction by quontra solutions
Big data introduction by quontra solutionsBig data introduction by quontra solutions
Big data introduction by quontra solutionsQUONTRASOLUTIONS
 
The Essential Data Ingredient
The Essential Data IngredientThe Essential Data Ingredient
The Essential Data IngredientRich Cooper
 
Does government matter in a digital world?
Does government matter in a digital world?Does government matter in a digital world?
Does government matter in a digital world?Ania Karzek
 
23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunisha23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunishaShivlal Mewada
 
Data for Business Journalism, NICAR 2012
Data for Business Journalism, NICAR 2012Data for Business Journalism, NICAR 2012
Data for Business Journalism, NICAR 2012Chris Taggart
 
Politics and social media
Politics and social mediaPolitics and social media
Politics and social mediaCeriHughes9
 
Invasion Of Privacy In Canadian Media
Invasion Of Privacy In Canadian MediaInvasion Of Privacy In Canadian Media
Invasion Of Privacy In Canadian MediaKelly Ratkovic
 
Dull, Difficult, and Essential: Managing Public Records
Dull,  Difficult,  and Essential: Managing Public RecordsDull,  Difficult,  and Essential: Managing Public Records
Dull, Difficult, and Essential: Managing Public RecordsPaul W. Taylor
 
Policy primer net303 study period 3, 2017
Policy primer net303  study period 3, 2017Policy primer net303  study period 3, 2017
Policy primer net303 study period 3, 2017Steve Mckee
 
Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution gngeorge
 
How to get open data into the hands of activists
How to get open data into the hands of activistsHow to get open data into the hands of activists
How to get open data into the hands of activistsAslam Khan
 
Datamarket: A Start-Up that will Change the World (with Open Data)
Datamarket: A Start-Up that will Change the World (with Open Data)Datamarket: A Start-Up that will Change the World (with Open Data)
Datamarket: A Start-Up that will Change the World (with Open Data)Bo Olafsson
 
Developers can Change The World
Developers can Change The WorldDevelopers can Change The World
Developers can Change The Worldjamesturk
 
Future of value of data singapore.compressed
Future of value of data   singapore.compressedFuture of value of data   singapore.compressed
Future of value of data singapore.compressedFuture Agenda
 
Future of data - Insights from Discussions Building on an Initial Perspective...
Future of data - Insights from Discussions Building on an Initial Perspective...Future of data - Insights from Discussions Building on an Initial Perspective...
Future of data - Insights from Discussions Building on an Initial Perspective...Future Agenda
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Daniel Austin
 

Similar to Open Data For Journalists : How it works, why it matters (20)

Big Data in the Legal Industry
Big Data in the Legal IndustryBig Data in the Legal Industry
Big Data in the Legal Industry
 
Big Data, Republicans and 2016
Big Data, Republicans and 2016Big Data, Republicans and 2016
Big Data, Republicans and 2016
 
Big data introduction by quontra solutions
Big data introduction by quontra solutionsBig data introduction by quontra solutions
Big data introduction by quontra solutions
 
The Essential Data Ingredient
The Essential Data IngredientThe Essential Data Ingredient
The Essential Data Ingredient
 
Does government matter in a digital world?
Does government matter in a digital world?Does government matter in a digital world?
Does government matter in a digital world?
 
23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunisha23 ijcse-01238-1indhunisha
23 ijcse-01238-1indhunisha
 
Data for Business Journalism, NICAR 2012
Data for Business Journalism, NICAR 2012Data for Business Journalism, NICAR 2012
Data for Business Journalism, NICAR 2012
 
Politics and social media
Politics and social mediaPolitics and social media
Politics and social media
 
Invasion Of Privacy In Canadian Media
Invasion Of Privacy In Canadian MediaInvasion Of Privacy In Canadian Media
Invasion Of Privacy In Canadian Media
 
Dull, Difficult, and Essential: Managing Public Records
Dull,  Difficult,  and Essential: Managing Public RecordsDull,  Difficult,  and Essential: Managing Public Records
Dull, Difficult, and Essential: Managing Public Records
 
data, big data, open data
data, big data, open datadata, big data, open data
data, big data, open data
 
Policy primer net303 study period 3, 2017
Policy primer net303  study period 3, 2017Policy primer net303  study period 3, 2017
Policy primer net303 study period 3, 2017
 
Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution Notes from the Observation Deck // A Data Revolution
Notes from the Observation Deck // A Data Revolution
 
The #BigData Dilemna
The #BigData Dilemna The #BigData Dilemna
The #BigData Dilemna
 
How to get open data into the hands of activists
How to get open data into the hands of activistsHow to get open data into the hands of activists
How to get open data into the hands of activists
 
Datamarket: A Start-Up that will Change the World (with Open Data)
Datamarket: A Start-Up that will Change the World (with Open Data)Datamarket: A Start-Up that will Change the World (with Open Data)
Datamarket: A Start-Up that will Change the World (with Open Data)
 
Developers can Change The World
Developers can Change The WorldDevelopers can Change The World
Developers can Change The World
 
Future of value of data singapore.compressed
Future of value of data   singapore.compressedFuture of value of data   singapore.compressed
Future of value of data singapore.compressed
 
Future of data - Insights from Discussions Building on an Initial Perspective...
Future of data - Insights from Discussions Building on an Initial Perspective...Future of data - Insights from Discussions Building on an Initial Perspective...
Future of data - Insights from Discussions Building on an Initial Perspective...
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014
 

More from Chris Taggart

Open Corporate Data: not just good, better
Open Corporate Data: not just good, betterOpen Corporate Data: not just good, better
Open Corporate Data: not just good, betterChris Taggart
 
Understanding corporate networks the open data way
Understanding corporate networks the open data wayUnderstanding corporate networks the open data way
Understanding corporate networks the open data wayChris Taggart
 
Corruption, corporate transparency and open data
Corruption, corporate transparency and open dataCorruption, corporate transparency and open data
Corruption, corporate transparency and open dataChris Taggart
 
How The Open Data Community Died - A Warning From The Future
How The Open Data Community Died - A Warning From The FutureHow The Open Data Community Died - A Warning From The Future
How The Open Data Community Died - A Warning From The FutureChris Taggart
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overviewChris Taggart
 
OpenlyLocal & Open Local Data in the UK
OpenlyLocal & Open Local Data in the UKOpenlyLocal & Open Local Data in the UK
OpenlyLocal & Open Local Data in the UKChris Taggart
 
Open local data: challenges and opportunities
Open local data: challenges and opportunitiesOpen local data: challenges and opportunities
Open local data: challenges and opportunitiesChris Taggart
 
Open Data & The Rewards of Failure
Open Data & The Rewards of FailureOpen Data & The Rewards of Failure
Open Data & The Rewards of FailureChris Taggart
 
Open Local Data Presentation
Open Local Data PresentationOpen Local Data Presentation
Open Local Data PresentationChris Taggart
 
Opening up local government data: APPSI Presentation
Opening up local government data: APPSI PresentationOpening up local government data: APPSI Presentation
Opening up local government data: APPSI PresentationChris Taggart
 

More from Chris Taggart (10)

Open Corporate Data: not just good, better
Open Corporate Data: not just good, betterOpen Corporate Data: not just good, better
Open Corporate Data: not just good, better
 
Understanding corporate networks the open data way
Understanding corporate networks the open data wayUnderstanding corporate networks the open data way
Understanding corporate networks the open data way
 
Corruption, corporate transparency and open data
Corruption, corporate transparency and open dataCorruption, corporate transparency and open data
Corruption, corporate transparency and open data
 
How The Open Data Community Died - A Warning From The Future
How The Open Data Community Died - A Warning From The FutureHow The Open Data Community Died - A Warning From The Future
How The Open Data Community Died - A Warning From The Future
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overview
 
OpenlyLocal & Open Local Data in the UK
OpenlyLocal & Open Local Data in the UKOpenlyLocal & Open Local Data in the UK
OpenlyLocal & Open Local Data in the UK
 
Open local data: challenges and opportunities
Open local data: challenges and opportunitiesOpen local data: challenges and opportunities
Open local data: challenges and opportunities
 
Open Data & The Rewards of Failure
Open Data & The Rewards of FailureOpen Data & The Rewards of Failure
Open Data & The Rewards of Failure
 
Open Local Data Presentation
Open Local Data PresentationOpen Local Data Presentation
Open Local Data Presentation
 
Opening up local government data: APPSI Presentation
Opening up local government data: APPSI PresentationOpening up local government data: APPSI Presentation
Opening up local government data: APPSI Presentation
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Open Data For Journalists : How it works, why it matters

  • 1. Open data for journalists How it’s useful, why it matters Chris Taggart, OpenCorporates, NICAR, Feb 2012
  • 3. About OpenCorporates tions isdic tes 7 jur nies in 4 22 US sta compa clud ing 6million In wov er 3 No
  • 4. A simple (but huge) goal: an entry for every corporate legal entity in the world Based on the company number and jurisdiction (no monopoly id)
  • 5. A simple (but huge) goal: an entry for every corporate legal entity in the world Based on the company number and jurisdiction (no monopoly id)
  • 6. A simple (but huge) goal: an entry for every corporate legal entity in the world Based on the company number and jurisdiction (no monopoly id)
  • 8. The simple search Not to be underestimated
  • 9. The simple search Not to be underestimated
  • 10. The simple search Not to be underestimated Massively reduces friction (how long will it take you to find and search multiple jurisdictions)
  • 11. The simple search Not to be underestimated Massively reduces friction (how long will it take you to find and search multiple jurisdictions)
  • 12. The simple search Not to be underestimated Massively reduces friction (how long will it take you to find and search multiple jurisdictions) Allows what if questions
  • 13. The simple search Not to be underestimated Massively reduces friction (how long will it take you to find and search multiple jurisdictions) Allows what if questions Potentially generates stories in its own right
  • 14. The simple search Not to be underestimated Massively reduces friction (how long will it take you to find and search multiple jurisdictions) Allows what if questions Potentially generates stories in its own right
  • 16. Source for additional info Addresses, filings, status, websites...
  • 17. Source for additional info Addresses, filings, status, websites...
  • 18. Source for additional info Addresses, filings, status, websites... Intl trademarks, UK govt spending, official notices, health & safety...
  • 19. Source for additional info Addresses, filings, status, websites... Intl trademarks, UK govt spending, official notices, health & safety...
  • 20. Source for additional info Addresses, filings, status, websites... Intl trademarks, UK govt spending, official notices, health & safety... Other IDs: SEC, CAGE, charity....
  • 21. Source for additional info Addresses, filings, status, websites... Intl trademarks, UK govt spending, official notices, health & safety... Other IDs: SEC, CAGE, charity.... Coming soon: lobbying registers
  • 22. Reconciliation (matching names to legal entities) Cleans up messy company names (& previous names) to legal entity, and from there to other data
  • 23. The database/platform API: allows all information to be retrieved as data, even searches
  • 24. Why care about open data?
  • 26. Information is the currency of democracy Thomas Jefferson* * This quote has also been attributed to Ralph Nader
  • 27. ATA is the currency Information D of democracy Thomas Jefferson* * This quote has also been attributed to Ralph Nader
  • 28. ATA is the currency Information D of democracy Thomas Jefferson* We live in a big data world. Our lives are not just governed by data, they are data * This quote has also been attributed to Ralph Nader
  • 29. ATA is the currency Information D of democracy Thomas Jefferson* We live in a big data world. Our lives are not just governed by data, they are data The biggest databases in the world are private not governmental – social networks, search engines, finance companies, supermarkets... * This quote has also been attributed to Ralph Nader
  • 30. ATA is the currency Information D of democracy Thomas Jefferson* We live in a big data world. Our lives are not just governed by data, they are data The biggest databases in the world are private not governmental – social networks, search engines, finance companies, supermarkets... These are enriched by public data that are only available to purchase * This quote has also been attributed to Ralph Nader
  • 31. ATA is the currency Information D of democracy Thomas Jefferson* We live in a big data world. Our lives are not just governed by data, they are data The biggest databases in the world are private not governmental – social networks, search engines, finance companies, supermarkets... These are enriched by public data that Getting worse are only available to purchase in USA * This quote has also been attributed to Ralph Nader
  • 33. Good journalism = data journalism
  • 34. Good journalism = data journalism But the data is complex
  • 35. Good journalism = data journalism But the data is complex Split across multiple datasets
  • 36. Good journalism = data journalism But the data is complex Split across multiple datasets The links are not clear and often redacted
  • 37. Good journalism = data journalism But the data is complex Split across multiple datasets The links are not clear and often redacted And without both access and the right to reuse the data, you can’t even begin to make sense of it
  • 38. Good journalism = data journalism But the data is complex Split across multiple datasets The links are not clear and often redacted And without both access and the right to reuse the data, you can’t even begin to make sense of it Even then, it’s HARD
  • 39. Good journalism = data journalism But the data is complex Split across multiple datasets The links are not clear and often redacted And without both access and the right to reuse the data, you can’t even begin to make sense of it Even then, it’s HARD But that’s why it’s worth doing
  • 40.
  • 41. In short, this only works:
  • 42. In short, this only works: SELECT companies.* FROM companies INNER JOIN government_suppliers ON companies.id = government_suppliers.company_id INNER JOIN directors ON companies.id = directors.company_id WHERE government_suppliers.total_received > 5000000 AND directors.convicted_tax_evader = true
  • 43. In short, this only works: SELECT companies.* FROM companies INNER JOIN government_suppliers ON companies.id = government_suppliers.company_id INNER JOIN directors ON companies.id = directors.company_id WHERE government_suppliers.total_received > 5000000 AND directors.convicted_tax_evader = true if you can get the data

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n