SlideShare a Scribd company logo
1 of 58
Tom Johnson Managing Director Inst. for Analytic Journalism Santa Fe, New Mexico USA t o m @ j t j o h n s o n . c o m    It’s not the documents;  it’s the DATA!
It’s not the documents,  it’s the DATA! ,[object Object],[object Object],[object Object],[object Object],This PowerPoint deck and Tipsheet posted at: http:// j o h n s o n – f o g . n o t l o n g . c o m          Licensed under a  Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License .
[object Object],1 Nothing is as important – and valuable – as a good theory!
Theory of Journalistic Process ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],2
Bertillon system: Public Records DB  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bertillon system: Public Records DB  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bertillon system: Public Records DB  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Early “hard drives,” data retrieval and data analysis of public records
Bertillon system: Public Records DB  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Early “hard drives,” data retrieval and data analysis of public records
Traditional Data In     Analysis      Info Out Data In     Analysis    Info Out ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Digital Age Data In     Analysis      Info Out ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Digital Age Data In     Analysis      Info Out ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],} ,[object Object],[object Object]
[object Object],[object Object],3
Four stories ,[object Object],[object Object],[object Object],[object Object]
Journalism and GIS ,[object Object],[object Object],Hurricane Andrew + damage reports + building inspection = jail terms
Doig: Hurricane Andrew
Four stories ,[object Object],[object Object]
Analysis with real data Search   Sort DB info
Four stories ,[object Object],[object Object],[object Object]
Vanishing Wetlands
Four stories ,[object Object],[object Object],[object Object],[object Object],[object Object]
UK MP’s expenses Solid search tools These are PDFs,  POST -search
Major questions? ,[object Object],[object Object],[object Object],[object Object]
Files, Transparency, Ease of Analysis Easier Challenging
Files, Transparency, Ease of Analysis
Data In: Objectives/Requirements ,[object Object],[object Object]
Data In: Objectives/Requirements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data In:  “Typical” problems with gov sites ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Good NM sites Search! Español Feedback!
NM Legis. Bill Finder Could be better: no way to find what bills were introduced by X legislator Download bill in  TWO formats
Data In: Challenges ,[object Object],[object Object]
Data In: Challenges in SunshinePort ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bottom line on SunshinePortalNM.com ,[object Object],“ This is not even a web page, it’s a Flash application, so there’s not going to be much sunlight escaping from this portal. “
Bottom line on SunshinePortalNM.com ,[object Object],“ This is not even a web page, it’s a Flash application, so there’s not going to be much sunlight escaping from this portal. “ “ A perfect example of creating the appearance of transparency without actually being transparent.”
Good data sites – Gov and NGO ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Common aspects? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Challenge for Watchdogs? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Tomorrow? Public Access to Original Data Impact Why not?
It’s not the documents, it’s the DATA! Tom Johnson Managing Director Inst. for Analytic Journalism Santa Fe, New Mexico USA t o m @ j t j o h n s o n . c o m   Gracias a todos
It’s not the documents,  it’s the DATA! ,[object Object],[object Object],[object Object],[object Object],This PowerPoint deck and Tipsheet posted at: http://johnson-fog.notlong.com        
FOI history ,[object Object],[object Object],[object Object],[object Object]
Early police data base: incomplete data Source: Jay, Ricky.  “Grifters, Bunco Artists & Flimflammen.”  Wired, Feb. 2011, p.88.  http://rickyjay.com/
NM HB 406 ,[object Object],[object Object],[object Object],But what if it wasn’t New Mexico state employees directly at fault?
Analytic Tools ,[object Object],[object Object]
“ Analytic tools” also for story-telling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
FOIA b(3) Exemptions Original:  http://www.propublica.org/article/foia-exemptions-sunshine-law
Content Analysis
Content analysis of legis party  text
Positive example of gov’t data ,[object Object],[object Object],Same data available in two formats!
NM HB 406 ,[object Object],[object Object]
“ Data In” questions Data In     Analysis    Info Out ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
“ Data In” questions Data In     Analysis    Info Out ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
“ Analysis” phase Data In     Analysis    Info Out ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
“ Analysis” phase Data In     Analysis    Info Out ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data In     Analysis      Info Out Data In     Analysis    Info Out ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data In     Analysis      Info Out Data In     Analysis     Info Out ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Theory of Journalistic Process Copyright ©  J. T. Johnson Data  In ,[object Object],[object Object],[object Object],[object Object],[object Object],This is a headline  DATELINE -- And the traditional text story starts here and goes on and on and on. Info Out Analysis

More Related Content

What's hot

Data journalism presentation
Data journalism presentationData journalism presentation
Data journalism presentationKwami Ahiabenu,II
 
Emerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsEmerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsAdam Papendieck
 
Computer assisted research and reporting
Computer assisted research and reportingComputer assisted research and reporting
Computer assisted research and reportingpeterverweij
 
An open data story
An open data storyAn open data story
An open data storyProgCity
 
Experiences as a producer, consumer and observer of open data
Experiences as a producer, consumer and observer of open dataExperiences as a producer, consumer and observer of open data
Experiences as a producer, consumer and observer of open dataProgCity
 
Big Data and Social Sciences
Big Data and Social SciencesBig Data and Social Sciences
Big Data and Social SciencesDavid De Roure
 
Teaching AI in data journalism
Teaching AI in data journalismTeaching AI in data journalism
Teaching AI in data journalismPaul Bradshaw
 
Presentación Prof. Maria Esther Vida. DataBootCampVE/31 octubre 2013
Presentación Prof. Maria Esther Vida. DataBootCampVE/31 octubre 2013Presentación Prof. Maria Esther Vida. DataBootCampVE/31 octubre 2013
Presentación Prof. Maria Esther Vida. DataBootCampVE/31 octubre 2013ProgramaMediosCentroCarterVE
 
DataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris WigginsDataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris WigginsHakka Labs
 
New Forms of Data for e-Research
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-ResearchDavid De Roure
 

What's hot (20)

Ongoing Research in Data Studies
Ongoing Research in Data StudiesOngoing Research in Data Studies
Ongoing Research in Data Studies
 
Rogers digitalmethods 4nov2010
Rogers digitalmethods 4nov2010Rogers digitalmethods 4nov2010
Rogers digitalmethods 4nov2010
 
Data journalism presentation
Data journalism presentationData journalism presentation
Data journalism presentation
 
Community Data Program Submitted letter to Open Government Partneship
Community Data Program Submitted letter to Open Government PartneshipCommunity Data Program Submitted letter to Open Government Partneship
Community Data Program Submitted letter to Open Government Partneship
 
Emerging Trends in Crisis Informatics
Emerging Trends in Crisis InformaticsEmerging Trends in Crisis Informatics
Emerging Trends in Crisis Informatics
 
Critical Data Studies in the Academy
Critical Data Studies in the AcademyCritical Data Studies in the Academy
Critical Data Studies in the Academy
 
Lauriault access donneesnumeriques_legal@it__04042011
Lauriault access donneesnumeriques_legal@it__04042011Lauriault access donneesnumeriques_legal@it__04042011
Lauriault access donneesnumeriques_legal@it__04042011
 
Computer assisted research and reporting
Computer assisted research and reportingComputer assisted research and reporting
Computer assisted research and reporting
 
Open data, open government, transparency, evidence-informed decision making &...
Open data, open government, transparency, evidence-informed decision making &...Open data, open government, transparency, evidence-informed decision making &...
Open data, open government, transparency, evidence-informed decision making &...
 
An open data story
An open data storyAn open data story
An open data story
 
Today's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's CitizensToday's Data Grow Tomorrow's Citizens
Today's Data Grow Tomorrow's Citizens
 
Experiences as a producer, consumer and observer of open data
Experiences as a producer, consumer and observer of open dataExperiences as a producer, consumer and observer of open data
Experiences as a producer, consumer and observer of open data
 
Data, Infrastructures and Geographical Imaginations
Data, Infrastructures and Geographical ImaginationsData, Infrastructures and Geographical Imaginations
Data, Infrastructures and Geographical Imaginations
 
Crowdsourcing: A Geographic Approach to Identifying Policy Opportunities and ...
Crowdsourcing: A Geographic Approach to Identifying Policy Opportunities and ...Crowdsourcing: A Geographic Approach to Identifying Policy Opportunities and ...
Crowdsourcing: A Geographic Approach to Identifying Policy Opportunities and ...
 
Big Data and Social Sciences
Big Data and Social SciencesBig Data and Social Sciences
Big Data and Social Sciences
 
Teaching AI in data journalism
Teaching AI in data journalismTeaching AI in data journalism
Teaching AI in data journalism
 
Presentación Prof. Maria Esther Vida. DataBootCampVE/31 octubre 2013
Presentación Prof. Maria Esther Vida. DataBootCampVE/31 octubre 2013Presentación Prof. Maria Esther Vida. DataBootCampVE/31 octubre 2013
Presentación Prof. Maria Esther Vida. DataBootCampVE/31 octubre 2013
 
DataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris WigginsDataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris Wiggins
 
Data and Technological Citizenship: Principled Public Interest Governing
Data and Technological Citizenship: Principled Public Interest GoverningData and Technological Citizenship: Principled Public Interest Governing
Data and Technological Citizenship: Principled Public Interest Governing
 
New Forms of Data for e-Research
New Forms of Data for e-ResearchNew Forms of Data for e-Research
New Forms of Data for e-Research
 

Similar to It's not the documents; it's the DATA

IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...J T "Tom" Johnson
 
Briefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data ApproachBriefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data Approach3 Round Stones
 
Data-driven stories off your beat - Mark Nichols - Muncie NewsTrain - 3.24.18
Data-driven stories off your beat - Mark Nichols - Muncie NewsTrain - 3.24.18Data-driven stories off your beat - Mark Nichols - Muncie NewsTrain - 3.24.18
Data-driven stories off your beat - Mark Nichols - Muncie NewsTrain - 3.24.18News Leaders Association's NewsTrain
 
An introduction to open data
An introduction to open dataAn introduction to open data
An introduction to open dataAnders Pedersen
 
Federal Statistical System, Transparency Camp West
Federal Statistical System, Transparency Camp WestFederal Statistical System, Transparency Camp West
Federal Statistical System, Transparency Camp Westbradstenger
 
Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...
Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...
Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...News Leaders Association's NewsTrain
 
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...News Leaders Association's NewsTrain
 
A Short History of Big Data
A Short History of Big DataA Short History of Big Data
A Short History of Big DataGadi Eichhorn
 
Can We Reveal the Concealed? Democratization of Data
Can We Reveal the Concealed? Democratization of DataCan We Reveal the Concealed? Democratization of Data
Can We Reveal the Concealed? Democratization of DataCommunity Development Halton
 
Data driven enterprise off your beat - denver news train - april 11-12, 2019
Data driven enterprise off your beat - denver news train - april 11-12, 2019Data driven enterprise off your beat - denver news train - april 11-12, 2019
Data driven enterprise off your beat - denver news train - april 11-12, 2019News Leaders Association's NewsTrain
 
Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...
Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...
Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...News Leaders Association's NewsTrain
 
Unpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureUnpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureTim Davies
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeTyrone Grandison
 
Data Visualization in the Newsroom
Data Visualization in the NewsroomData Visualization in the Newsroom
Data Visualization in the NewsroomCarl V. Lewis
 
DataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDATAVERSITY
 
Data Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as powerData Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as powerSpeck&Tech
 

Similar to It's not the documents; it's the DATA (20)

It's the people's data
It's the people's dataIt's the people's data
It's the people's data
 
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
 
Briefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data ApproachBriefing on US EPA Open Data Strategy using a Linked Data Approach
Briefing on US EPA Open Data Strategy using a Linked Data Approach
 
Data-driven stories off your beat - Mark Nichols - Muncie NewsTrain - 3.24.18
Data-driven stories off your beat - Mark Nichols - Muncie NewsTrain - 3.24.18Data-driven stories off your beat - Mark Nichols - Muncie NewsTrain - 3.24.18
Data-driven stories off your beat - Mark Nichols - Muncie NewsTrain - 3.24.18
 
An introduction to open data
An introduction to open dataAn introduction to open data
An introduction to open data
 
Federal Statistical System, Transparency Camp West
Federal Statistical System, Transparency Camp WestFederal Statistical System, Transparency Camp West
Federal Statistical System, Transparency Camp West
 
Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...
Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...
Data-Driven Enterprise off Your Beat by Manuel Torres - Monroe, La., NewsTrai...
 
Be a Better Business Watchdog -- CAR for Business Journalists
Be a Better Business Watchdog -- CAR for Business JournalistsBe a Better Business Watchdog -- CAR for Business Journalists
Be a Better Business Watchdog -- CAR for Business Journalists
 
Democratizing Data
Democratizing DataDemocratizing Data
Democratizing Data
 
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
Data-driven enterprise off your beat - Doug Caruso - Columbus, Ohio, NewsTrai...
 
A Short History of Big Data
A Short History of Big DataA Short History of Big Data
A Short History of Big Data
 
Can We Reveal the Concealed? Democratization of Data
Can We Reveal the Concealed? Democratization of DataCan We Reveal the Concealed? Democratization of Data
Can We Reveal the Concealed? Democratization of Data
 
Data driven enterprise off your beat - denver news train - april 11-12, 2019
Data driven enterprise off your beat - denver news train - april 11-12, 2019Data driven enterprise off your beat - denver news train - april 11-12, 2019
Data driven enterprise off your beat - denver news train - april 11-12, 2019
 
Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...
Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...
Data-driven enterprise off your beat - Sarah Cohen - Phoenix NewsTrain - Apri...
 
Unpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructureUnpacking Open Data: power, politics and the importance of infrastructure
Unpacking Open Data: power, politics and the importance of infrastructure
 
Creating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With PurposeCreating a Data-Driven Government: Big Data With Purpose
Creating a Data-Driven Government: Big Data With Purpose
 
Data Visualization in the Newsroom
Data Visualization in the NewsroomData Visualization in the Newsroom
Data Visualization in the Newsroom
 
DataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success StoriesDataEd Slides: Getting Data Quality Right – Success Stories
DataEd Slides: Getting Data Quality Right – Success Stories
 
Umhoefer: Data-driven enterprise - handout
Umhoefer: Data-driven enterprise - handoutUmhoefer: Data-driven enterprise - handout
Umhoefer: Data-driven enterprise - handout
 
Data Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as powerData Activism: data as rhetoric, data as power
Data Activism: data as rhetoric, data as power
 

More from J T "Tom" Johnson

Doing Journalism in The Digital Age.
Doing Journalism in The Digital Age.  Doing Journalism in The Digital Age.
Doing Journalism in The Digital Age. J T "Tom" Johnson
 
Death (or Live?) of American Journalism-Part 2
 Death (or Live?) of American Journalism-Part 2 Death (or Live?) of American Journalism-Part 2
Death (or Live?) of American Journalism-Part 2J T "Tom" Johnson
 
Death (or Live?) of American Journalism-Part 1
 Death (or Live?) of American Journalism-Part 1 Death (or Live?) of American Journalism-Part 1
Death (or Live?) of American Journalism-Part 1J T "Tom" Johnson
 
Dominican republic journos cir 31 jan 2020
Dominican republic journos   cir 31 jan 2020Dominican republic journos   cir 31 jan 2020
Dominican republic journos cir 31 jan 2020J T "Tom" Johnson
 
Presentation to Journalists from the Dominican Republic
Presentation to Journalists from the Dominican RepublicPresentation to Journalists from the Dominican Republic
Presentation to Journalists from the Dominican RepublicJ T "Tom" Johnson
 
Data can only dance with its music NICAR17
Data can only dance with its music NICAR17Data can only dance with its music NICAR17
Data can only dance with its music NICAR17J T "Tom" Johnson
 
Dancing faster in the datasphere
Dancing faster in the datasphereDancing faster in the datasphere
Dancing faster in the datasphereJ T "Tom" Johnson
 
Tom johnson datavalidity-eng-nov21-arbol
Tom johnson datavalidity-eng-nov21-arbolTom johnson datavalidity-eng-nov21-arbol
Tom johnson datavalidity-eng-nov21-arbolJ T "Tom" Johnson
 
Esp #001-no son los documentos; son los datos-traducido
 Esp #001-no son los documentos; son los datos-traducido Esp #001-no son los documentos; son los datos-traducido
Esp #001-no son los documentos; son los datos-traducidoJ T "Tom" Johnson
 
Esp #002-validación de datos en la era digital-traducido
 Esp #002-validación de datos en la era digital-traducido Esp #002-validación de datos en la era digital-traducido
Esp #002-validación de datos en la era digital-traducidoJ T "Tom" Johnson
 
Esp #003-open-datamovement-traducido
 Esp #003-open-datamovement-traducido Esp #003-open-datamovement-traducido
Esp #003-open-datamovement-traducidoJ T "Tom" Johnson
 
Esp #004-proceso de periodismo en el nuevo datosfera-traducido
 Esp #004-proceso de periodismo en el nuevo datosfera-traducido Esp #004-proceso de periodismo en el nuevo datosfera-traducido
Esp #004-proceso de periodismo en el nuevo datosfera-traducidoJ T "Tom" Johnson
 
Data validation in the Digital Age
Data validation in the Digital AgeData validation in the Digital Age
Data validation in the Digital AgeJ T "Tom" Johnson
 
The Global Open Data Movement
The Global Open Data MovementThe Global Open Data Movement
The Global Open Data MovementJ T "Tom" Johnson
 
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereJ T "Tom" Johnson
 
Be your own publisher seminar 2010-session A
Be your own publisher seminar 2010-session ABe your own publisher seminar 2010-session A
Be your own publisher seminar 2010-session AJ T "Tom" Johnson
 
Be your own publisher seminar calif april 2010-session1_b_darkbkgd
Be your own publisher seminar  calif april 2010-session1_b_darkbkgdBe your own publisher seminar  calif april 2010-session1_b_darkbkgd
Be your own publisher seminar calif april 2010-session1_b_darkbkgdJ T "Tom" Johnson
 
Be your own publisher seminar calif april 2010-session1_c_darkbkgd
Be your own publisher seminar  calif april 2010-session1_c_darkbkgdBe your own publisher seminar  calif april 2010-session1_c_darkbkgd
Be your own publisher seminar calif april 2010-session1_c_darkbkgdJ T "Tom" Johnson
 
Be your own publisher seminar calif april 2010-session1_d_darkbkgd
Be your own publisher seminar  calif april 2010-session1_d_darkbkgdBe your own publisher seminar  calif april 2010-session1_d_darkbkgd
Be your own publisher seminar calif april 2010-session1_d_darkbkgdJ T "Tom" Johnson
 

More from J T "Tom" Johnson (20)

Doing Journalism in The Digital Age.
Doing Journalism in The Digital Age.  Doing Journalism in The Digital Age.
Doing Journalism in The Digital Age.
 
Death (or Live?) of American Journalism-Part 2
 Death (or Live?) of American Journalism-Part 2 Death (or Live?) of American Journalism-Part 2
Death (or Live?) of American Journalism-Part 2
 
Death (or Live?) of American Journalism-Part 1
 Death (or Live?) of American Journalism-Part 1 Death (or Live?) of American Journalism-Part 1
Death (or Live?) of American Journalism-Part 1
 
Dominican republic journos cir 31 jan 2020
Dominican republic journos   cir 31 jan 2020Dominican republic journos   cir 31 jan 2020
Dominican republic journos cir 31 jan 2020
 
Presentation to Journalists from the Dominican Republic
Presentation to Journalists from the Dominican RepublicPresentation to Journalists from the Dominican Republic
Presentation to Journalists from the Dominican Republic
 
Data can only dance with its music NICAR17
Data can only dance with its music NICAR17Data can only dance with its music NICAR17
Data can only dance with its music NICAR17
 
Dancing faster in the datasphere
Dancing faster in the datasphereDancing faster in the datasphere
Dancing faster in the datasphere
 
Tom johnson datavalidity-eng-nov21-arbol
Tom johnson datavalidity-eng-nov21-arbolTom johnson datavalidity-eng-nov21-arbol
Tom johnson datavalidity-eng-nov21-arbol
 
Esp #001-no son los documentos; son los datos-traducido
 Esp #001-no son los documentos; son los datos-traducido Esp #001-no son los documentos; son los datos-traducido
Esp #001-no son los documentos; son los datos-traducido
 
Esp #002-validación de datos en la era digital-traducido
 Esp #002-validación de datos en la era digital-traducido Esp #002-validación de datos en la era digital-traducido
Esp #002-validación de datos en la era digital-traducido
 
Esp #003-open-datamovement-traducido
 Esp #003-open-datamovement-traducido Esp #003-open-datamovement-traducido
Esp #003-open-datamovement-traducido
 
Esp #004-proceso de periodismo en el nuevo datosfera-traducido
 Esp #004-proceso de periodismo en el nuevo datosfera-traducido Esp #004-proceso de periodismo en el nuevo datosfera-traducido
Esp #004-proceso de periodismo en el nuevo datosfera-traducido
 
Data validation in the Digital Age
Data validation in the Digital AgeData validation in the Digital Age
Data validation in the Digital Age
 
The Global Open Data Movement
The Global Open Data MovementThe Global Open Data Movement
The Global Open Data Movement
 
Analytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the DatasphereAnalytic Journalism: Digital Evolution in the Datasphere
Analytic Journalism: Digital Evolution in the Datasphere
 
Numeracy for journos
Numeracy for journosNumeracy for journos
Numeracy for journos
 
Be your own publisher seminar 2010-session A
Be your own publisher seminar 2010-session ABe your own publisher seminar 2010-session A
Be your own publisher seminar 2010-session A
 
Be your own publisher seminar calif april 2010-session1_b_darkbkgd
Be your own publisher seminar  calif april 2010-session1_b_darkbkgdBe your own publisher seminar  calif april 2010-session1_b_darkbkgd
Be your own publisher seminar calif april 2010-session1_b_darkbkgd
 
Be your own publisher seminar calif april 2010-session1_c_darkbkgd
Be your own publisher seminar  calif april 2010-session1_c_darkbkgdBe your own publisher seminar  calif april 2010-session1_c_darkbkgd
Be your own publisher seminar calif april 2010-session1_c_darkbkgd
 
Be your own publisher seminar calif april 2010-session1_d_darkbkgd
Be your own publisher seminar  calif april 2010-session1_d_darkbkgdBe your own publisher seminar  calif april 2010-session1_d_darkbkgd
Be your own publisher seminar calif april 2010-session1_d_darkbkgd
 

Recently uploaded

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

It's not the documents; it's the DATA

Editor's Notes

  1. Show of hands: How many journalists? How many from NGOs? How many lawyers (you can double count yourself)? How many students?
  2. Theory of PROCESS Same for journalists, lawyers socially concerned citizens Same up to the point of “Information Out” – there we might write and present somewhat differently for different audiences
  3. While sometimes an individual document can be important, today seeing the patterns of behaviors is usually just as important and insightful if not more so.
  4. Early public records Intricate data collection Potential for error in data entry Potential for error in filing No machine retrieval or analysis Even today, OCR would be impossible http://cultureandcommunication.org/deadmedia/index.php/Bertillon_System This rare Bertillon Card (named after the inventor of Anthropometry) Decline of Bertillonage Fingerprint killed the Bertillon star The complexity of the Bertillon system —the very thing that provided it with such accurate and reliable data—also proved to be its downfall: it was simply too cumbersome to replicate with sufficient accuracy. As soon as Bertillon’s procedures began to be disseminated outside of Paris there were problems; as Cole explains: Learning the system from translated books, far from the exacting presence of Bertillon himself, identification clerks seldom replicated the rigor that characterized operations in Paris. Instead, they skimped on learning the morphological vocabulary, glossed over the precise movements in the measuring process, and contented themselves with sloppily recording a few measurements. Worse, most identification bureaus, too proud to simply adopt Bertillon’s system wholesale, took it upon themselves to modify various aspects of the system. (Cole 2001, 52) Bertillon anticipated these problems, writing a strongly-worded message in his instruction manual directed towards all those who would consider meddling with his finally tuned methods: The arrangement of these instruments was the subject of many experiments and numberless improvements before they reached their present shape, which we consider as final. So we reject in advance every modification, every further change, however slight, either in their form or in their manner of using them. That is a great temptation for beginners, to whom numerous new ideas occur, but who are not aware that all these ideas, even those that they believe to be the most original, the most personal, have already been proposed by others, tried and finally rejected for divers reasons. (Bertillon 1896, 19) Alas, Bertillon’s warnings were not heeded, and the accuracy of anthropometric measurements—and the reputation of the system as a whole—suffered as result. Even if the integrity of Bertillon’s system could be sustained outside of Paris, it was soon to be overtaken by another form of criminal identification. As Kaluszynski notes, “at the last moment before it seemed likely to dominate the future, anthropometry was to undergo a rude shock. Its success had barely been established and savored when its supremacy began to falter in the face of a new and infallible technique” (2001, 128). Of course, the new technique was fingerprinting, a much simpler process than Bertillonage. “A fingerprint is a physical sign that cannot be falsified or disguised, and the mathematical likelihood of two individuals having identical fingerprints is infinitely small” (128). Occam’s razor would dictate that fingerprinting soon supplant Bertillonage as the world-wide standard for criminal identification.
  5. By 1910… Indexing system has improved Typewriters instead of pen Better haircuts But still … Null fields Subject to data entry errors; lost or misfiled cards/data Limited large-scale analysis resources
  6. Early “hard drives,” data retrieval and data analysis of public records
  7. A public record, but one of limited usage A DOCUMENT , but no efficient, productive, insightful way to FIND the data A DOCUMENT , but no efficient, productive, insightful way to EXTRACT the data Sorta like a PDF
  8. All data today requires NEW tools for ANALYSIS and STORY-TELLING Statutes are usually adequate; the CULTURES are the challenge. Both the culture of politicians and bureaucrats AND the culture of traditional journalism -- which reports the event not the issue -- and lack contemporary analytic skills.
  9. Those were combined with data from the National Wetlands Inventory and the state Fish and Wildlife Conservation Commission . Waite: The government doesn't know how many acres of Florida wetlands have been destroyed in the past 15 years. No state or federal agency has kept track, not even the Army Corps of Engineers, which has the final say on protecting wetlands. Another federal agency, the National Wetlands Inventory, is supposed to track losses nationwide. The tiny agency, based in St. Petersburg, mapped Florida's wetlands 20 years ago, but hasn't updated its maps except for two of Florida's 67 counties. So the St. Petersburg Times examined satellite images of Florida to determine the loss of wetlands. Satellite images taken in the late 1980s were compared with those taken in 2003. A random sample of 385 places on the resulting maps were checked against other data through property records, aerial images and site visits. No satellite image analysis can be 100 percent accurate, particularly one covering such a broad area. In this case, the accuracy was about 85 percent, the level required by the U.S. Geological Survey for similar satellite analysis. To filter out temporary changes from long-lasting ones, the analysis relied on a map of urbanization created by the state wildlife agency . That showed about 84,000 missing acres of wetlands. The methodology was reviewed by Barnali Dixon, professor of geography at the University of South Florida; Leonard G. Pearlstine, assistant scientist at the University of Florida's Fort Lauderdale Research and Education Center; and Tom Lillesand, professor of geography and director of the Environmental Remote Sensing Center at the University of Wisconsin-Madison. [Last modified December 14, 2006, 18:10:27] UK The Expenses Files -- MPs' Expenses A year ago the High Court backed an earlier ruling by the Information Tribunal that full details of MPs' expenses, including receipts, should be made public. Since then MPs have been accused of dragging their feet and playing for time. Full details are slated to be published in July but with some crucial details – such as addresses of second homes – blacked out. An investigation by the Telegraph has uncovered the full files. The Guardian - Join us in digging through the documents of MPs' expenses to identify individual claims, or documents that you think merit further investigation. You can work through your own MP's expenses, or just hit the button below to start reviewing. (Update, Fri pm: we now have a virtually complete set of expenses documents so you should be able to find your MP's) Already created an account? Log in here. “We have 458,832 pages of documents. 27,731 of you have reviewed 223,475 of them. Only 235,357 to go”
  10. http://www.flickr.com/photos/juggernautco/2844066535/in/photostream/
  11. The reporter harvested the data Cleaned and verified the data A team produced the story for multiple delivery platforms. But it all started with the data, some of which probably never existed as an ink-on-paper DOCUMENT Project homepage: http://www.azcentral.com/news/articles/2010/11/12/20101112arizona-pension-funds.html State pension fund tried suing 'Republic‘ http://www.azcentral.com/news/articles/arizona-pension-funds-records.html
  12. Waite: The government doesn't know how many acres of Florida wetlands have been destroyed in the past 15 years. No state or federal agency has kept track, not even the Army Corps of Engineers, which has the final say on protecting wetlands. Another federal agency, the National Wetlands Inventory, is supposed to track losses nationwide. The tiny agency, based in St. Petersburg, mapped Florida's wetlands 20 years ago, but hasn't updated its maps except for two of Florida's 67 counties. So the St. Petersburg Times examined satellite images of Florida to determine the loss of wetlands. Satellite images taken in the late 1980s were compared with those taken in 2003. Those were combined with data from the National Wetlands Inventory and the state Fish and Wildlife Conservation Commission . A random sample of 385 places on the resulting maps were checked against other data through property records, aerial images and site visits. No satellite image analysis can be 100 percent accurate, particularly one covering such a broad area. In this case, the accuracy was about 85 percent, the level required by the U.S. Geological Survey for similar satellite analysis. To filter out temporary changes from long-lasting ones, the analysis relied on a map of urbanization created by the state wildlife agency . That showed about 84,000 missing acres of wetlands. The methodology was reviewed by Barnali Dixon, professor of geography at the University of South Florida; Leonard G. Pearlstine, assistant scientist at the University of Florida's Fort Lauderdale Research and Education Center; and Tom Lillesand, professor of geography and director of the Environmental Remote Sensing Center at the University of Wisconsin-Madison. [Last modified December 14, 2006, 18:10:27] UK The Expenses Files -- MPs' Expenses A year ago the High Court backed an earlier ruling by the Information Tribunal that full details of MPs' expenses, including receipts, should be made public. Since then MPs have been accused of dragging their feet and playing for time. Full details are slated to be published in July but with some crucial details – such as addresses of second homes – blacked out. An investigation by the Telegraph has uncovered the full files. The Guardian - Join us in digging through the documents of MPs' expenses to identify individual claims, or documents that you think merit further investigation. You can work through your own MP's expenses, or just hit the button below to start reviewing. (Update, Fri pm: we now have a virtually complete set of expenses documents so you should be able to find your MP's) Already created an account? Log in here. “We have 458,832 pages of documents. 27,731 of you have reviewed 223,475 of them. Only 235,357 to go”
  13. Early, major example of crowd-source analysis “ Wet wear” content analysis tool Text data AND PDF but connection to the PDF is – SORTA -- the end result
  14. Range of file “states/form” Range of the challenge in extracting and analyzing the data “ JSON is an important standard for ease of interaction across systems. It's becoming the preferred route over XML in many cases. “ And as geo-spatial data explodes, addressing the standards there might be helpful. I would include KML, GeoJSON and SHP files for vector and many options for raster: bil, netCDF, ECW, GeoTIFF, etc.” (Guerin)
  15. And even these are NOT perfect; have to know some of the underlying assumptions inherent in these file types. That said, this is still the best point of departure when seeking to acquire files and their data. Just as an example, csv does not allow trailing zeros in a numeric field, so my zip would collapse from 02151 to 2151. Or, the field would be represented as text, "02151" (surrounded by quote marks). Some translation programs do that automatically, but there is no standard. Same problem with phone numbers, some equations, etc.  Csv also assumes field headers are on one line. They need to be in one cell in excel to translate correctly that way. Often, they are not, or the excel file has multiple levels of heads. XML is the general link format people want to use, but not all states have adopted it, and a standard schema. Yeah, csv standard does not even allow a blank row or a formatting row (like ---------) between the header and the live data table.  The format row is usually read as a zero, not null, and that screws up averages, medians and so forth. Excel "cheats" on calculating medians, etc. (SSR) Should be ANSI standard CVS (SSR)
  16. Move data from “out there” to analytic site/tools Looking for connections; patterns
  17. Seeking fine-grained data, NOT aggregations Seek data in original form (i.e. NO PDFs) Get data in lowest common denominator format: - Comma-delimited files in ASCII or Text Who collected the data? Why? How? Who proofed/edited the data? Why? How? If from data base, first ask for “record layout” or “code sheet” or “schema” Definitions of variables or fields. Constant or ???
  18. Barriers data = barriers to analysis NO site search capability; no site map Failure to use open-standard HTML; using closed-standard Adobe Flash/Shockwave environment. Page formats/layouts not consistent; too many drill-downs instead of search-driven generators Jiggly roll-overs; too much effort spent on bling Impossible to download or scrape data for analysis Information available only in Adobe PDF files; notoriously unfriendly to data analysis.
  19. State of NM gov’t agency develops creditable web site Search engine Choice of Spanish Opportunity for feedback Registered, i.e. OWNED, by the CITIZENS of New Mexico Award Digital Government Achievement Awards http://www.centerdigitalgov.com/survey/88/2010
  20. Another relatively valuable NM state website Clean site design Search engine Quick links to actual document in Word format and PDF
  21. No search engine No Spanish version Head-scratching logic in the taxonomy of the silos Why the roll-overs that don’t do much? If we drill down into “Capital Outlay” (top left menu) we end up with a 70-page PDF. Again.
  22. Go to these sites and, if lucky, find the document we want…. But they are all and only PDFs PDFs can be retrieved and saved – one at a time – to your desktop. Apps available that OCR what probably is the output of an Excel document. But that shows up on two partial pages. Which just adds more time and effort After extracting to Excel, then must be closely copy edited to make sure the extraction process read ever zero as a zero and not a capital letter “Oh” Another interesting problem: screwy, idiosyncratic fonts
  23. It is possible to build a good-looking site, integrating Flash technology if desired, while still making the underlying structured data directly available to users.   A good example is our election day results files ( http://elections.nytimes.com/2010/results/house ). If you view the source markup for this page, which includes very sharp-looking Flash elements, you'll find an embedded URL --   http://elections.nytimes.com/2010/results/house.tsv . That is a link to a tab-delimited file containing the data underlying the map. --Griff Palmer +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From: James Jennings < [email_address] > Date: Monday, February 14, 2011 Subject: Ease of scraping this site? To: chris feola < [email_address] > 1-The entire site is in flash.  I might be able to pipe some of the search data to a csv but not everything is searchable.  This is the best job of making public data as inaccessible as possible that I have ever seen.  It is a masterwork. I would call and just ask them to send it all in a spreadsheet and see what happens. jj
  24. It is possible to build a good-looking site, integrating Flash technology if desired, while still making the underlying structured data directly available to users.   A good example is our election day results files ( http://elections.nytimes.com/2010/results/house ). If you view the source markup for this page, which includes very sharp-looking Flash elements, you'll find an embedded URL --   http://elections.nytimes.com/2010/results/house.tsv . That is a link to a tab-delimited file containing the data underlying the map. --Griff Palmer +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From: James Jennings < [email_address] > Date: Monday, February 14, 2011 Subject: Ease of scraping this site? To: chris feola < [email_address] > 1-The entire site is in flash.  I might be able to pipe some of the search data to a csv but not everything is searchable.  This is the best job of making public data as inaccessible as possible that I have ever seen.  It is a masterwork. I would call and just ask them to send it all in a spreadsheet and see what happens. jj
  25. Examples of good GOVERNMENT sites.
  26. What do we see on all these GOOD gov’t sites? All have up-front search capabilities All are written in “data-accessible” code All data can be downloaded with “relative” ease Some have various languages available ALL are run by GOVERNMENT; no commercial sites
  27. Failure on the part of planners/bureaucrats to simply… Give The People THEIR Data… In The Most Basic, Original, Straightforward Form… And Let Them Figure Out What Should Be Done With It! The governor agrees
  28. See “The Public Document Information Act” -- http://sunlightfoundation.com/policy/poia/ The state has had – and has employed – for multiple years the Fiscal Impact Report, which is required to be attached to EVERY bill introduced in the Legislature In the 21 st Century – if we are to have not just citizens participating in democracy in an informed manner but economic growth We should have a requirement for every bill introduced that says, How will this bill advance the public’s access to the ORIGINAL FORM data related to this bill and topic?
  29. https://secure.wikimedia.org/wikipedia/en/wiki/Public_records Historic perspective The concept of public records first emerged in western Europe in the late middle ages. Some of the first public records were census records,birth, burial, and marriage records such as the Doomsday Book (1085-6) of William the Conquerer [2] and royalty marriage agreements, which were perceived as international treaties brokered by private parties. In the United Kingdom, Public Record Office Act was passed in 1838. [3] Of particular significance was the evolution of the common law right "to access court records to inspect and to copy". The expectation inherent in the common law right to access court records is that any person may come to the office of the clerk of the court during business hours and request to inspect court records, with almost instantaneous access. Such right is a central safeguard for the integrity of the courts. Any decision to conceal court records requires a sealing order. The right to access court records is also central to liberty: There is no conceivable way to exercise the Habeas Corpus right, deemed by the late Justice Brennan [4] as "the cornerstone" of the United States Constitution, absent access to court records as public records. In the United States the common law right to "access court records to inspect and to copy" was re-affirmed by the US Supreme Court in Nixon v Warner Communications, Inc (1978), where the court found various parts of the right to access court records as inherent to the First, Fourth, Sixth, and Fourteenth Amendments. Therefore, in the United States, access to court records is governed by Civil Rights in the Amendments to the United States Constitution, not by the Freedom of Information Act. [ edit ] Public records in the United States Access to public records in the US at the federal level is guided by the Freedom of Information Act (FOIA). Requests for access to records pursuant to FOIA are often frustrated by federal agencies through the numerous exemptions found in the law, and through redaction of critical data. Each state has its own version of FOIA. For example, in Colorado there is the Colorado Open Records Act [5] (CORA) and in New Jersey the law is known as the Open Public Records Act [6] (OPRA). There are many degrees of accessibility to public records between states, with some making it fairly easy to request and receive documents, and others with many exemptions and restricted categories of documents. One state that is fairly responsive to public records requests is New York, which utilizes the Committee on Open Government [7] to assist citizens with their requests. A state that is fairly restrictive in how they respond to public records requests is Pennsylvania, where the law currently presumes that all documents are exempt from disclosure [8] , unless they can be proven otherwise. The California Public Records Act - California Government Code §§6250-6276.48 - covers the arrest and booking records of inmates in the State of California jails and prisons, which are not covered by First Amendment rights. Public access to arrest and booking records is seen as a critical safeguard of Liberty.
  30. The ThemeRiver™ visualization helps users identify time-related patterns, trends, and relationships across a large collection of documents. The themes in the collection are represented by a "river" that flows left to right through time. The river widens or narrows to depict changes in the collective strength of selected themes in the underlying documents. Individual themes are represented as colored "currents" flowing within the river. The theme currents narrow or widen to indicate changes in individual theme strength at any point in time.
  31. Original source: http://www.propublica.org/article/foia-exemptions-sunshine-law NB: And if one picks the CIA, for example, You get a “vitural” webpage, NOT an actual document You can drill down into viewing the PDF, a secondary result of the search, not not primary result
  32. “ DATA” upon analysis becomes information “ DATA” is sensual, qualitative and quantitative. * Smell of a forest fire, expression of an interviewee’s feelings, copy of the state budget or a bill marked up by committee Quality of the “Information Out” can be no better than the DATA that goes in (and that means Research and Reporting) and the Analysis applies to that high-quality data. In the Infosphere, “Information” is often released back into the Infosphere to become “DATA” for some other species or colleagues’ use.