SlideShare a Scribd company logo
Data 2: Interrogating,
    visualising, mashing



   Online Journalism
   City University
   Paul Bradshaw
Monday, 7 March 2011
Themes


   5 things you need to know about each
   Data journalism in action
   Walkthrough



Monday, 7 March 2011
Interrogating data




   .


Monday, 7 March 2011
Monday, 7 March 2011
5 things you need to know about
    interrogating data

   1. Data always needs cleaning up
   2. Treat the ‘source’ like a source
   3. Use the right ‘average’ and
   percentage
   4. Variation over time & space: context
   5. Spreadsheet tools are your friend -
   but always backup copies
Monday, 7 March 2011
Monday, 7 March 2011
“What the Independent have done
 is confuse the UK’s deficit with our
 debt [making] the debt problem
 look around eight times worse than
 it is. And it used the whole of its
 front page to do so.”

                        - James Ball
Monday, 7 March 2011
Monday, 7 March 2011
What is the data worth?


   Measurement doesn't answer anything if
   there's only one variable
   Statistical significance
   Sample size and selection
   Controls and the placebo effect
   Read up.
Monday, 7 March 2011
1. Variance is interesting.
 2. Variance is different for different
 variables and in different
 populations.
 3. The amount of variance is easily
 quantified.
                       - Philip Meyer, Precision Journalism


Monday, 7 March 2011
Getting data in the right form


   Data > Text to columns
   Find & replace
   Conditional formulas:
   =IF(condition, if met, if not)
   =COUNTIF(range, test)

Monday, 7 March 2011
Walkthrough: cleaning data in
    Google Refine

   Edit cells > common transforms
   Edit cells > split multi-valued cells
   Facet > text facet
   Export...


Monday, 7 March 2011
Visualising data




   .


Monday, 7 March 2011
5 things you need to know about
    visualising data

   1. Choose the chart for the purpose
   2. It can be used to spot a lead
   3. Good design is when there’s nothing
   more to take away
   4. It should be self-contained & have refs
   5. Be careful with scales and classes
Monday, 7 March 2011
or http://chartchooser.juiceanalytics.com/
Monday, 7 March 2011
Monday, 7 March 2011
Monday, 7 March 2011
What is wrong with this picture?

Monday, 7 March 2011
Monday, 7 March 2011
http://simplecomplexity.net/statistics-without-context/


Monday, 7 March 2011
http://junkcharts.typepad.com/junk_charts/trifecta-checkup/

Monday, 7 March 2011
Visualisation tools


   ManyEyes
   Tableau
   Wordle, Tagxedo
   BatchGeo
   Gephi
   Delicious.com/paulb/visualisation+tools
Monday, 7 March 2011
Walkthrough: visualising data
    with Google Gadgets

   .




Monday, 7 March 2011
Walkthrough: visualising data in
    ManyEyes

   .




Monday, 7 March 2011
Mashing data




   .


Monday, 7 March 2011
5 things you need to know about
    mashing data

   1. It is what a journalist does best
   2. Look for a point of connection: place?
   Person? Company? Date?
   3. What an API can do
   4. What APIs there are
   5. Mashups can be live, updated or
   static
Monday, 7 March 2011
Monday, 7 March 2011
Monday, 7 March 2011
Mashup tools


   Yahoo! Pipes
   OpenHeatMap
   Mapalist
   xFruits
   Scraperwiki
   Maptube
Monday, 7 March 2011
Walkthrough: making mashups
    with Yahoo! Pipes

   Inputs - Fetch Feed, CSV, Data, Page,
   YQL, Flickr, Form
   Operators - Filter, Sort, Unique, Union,
   Count, Split, Rename, Regex, Unique,
   Location extractor, URL Builder
   Outputs - Map, Gallery, List, XML, KML
Monday, 7 March 2011
Walkthrough: making mashups
    with OpenHeatMap

   Format the spreadsheet
   Publish it as CSV
   Copy link
   Paste it at OpenHeatMap
   Fix any problems

Monday, 7 March 2011
Walkthrough: grabbing geo data
    with Google Refine

   Edit column > Add column by fetching
   URLs
   Use GREL (Google Refine Expression
   Language)
   Search web for help & examples

Monday, 7 March 2011
Questions?




  .


Monday, 7 March 2011
Links


   OnlineJournalismClasses.tumblr.com
   Delicious.com/paulb/cityoj09
   Delicious.com/paulb/datajournalism
   Delicious.com/paulb/visualisation
   Delicious.com/paulb/statistics
   Delicious.com/paulb/mashups
Monday, 7 March 2011
Lab


  Before the lab: play with these
  techniques yourself, have problems,
  find solutions, raise questions. Install
  Google Refine and Tableau on your
  laptop to use.
  - Visualise, interrogate or mash data
Monday, 7 March 2011
Books


   Kaiser Fung - Numbers Rule Your World
   Ben Goldacre - Bad Science
   Donna Wong - The WSJ Guide to
   Information Graphics
   Brian Suda - A Practical Guide to
   Designing with Data
Monday, 7 March 2011

More Related Content

Similar to Data Journalism 2: Interrogating, Visualising and Mashing

Data Journalism 2: cleaning, combining, communicating
Data Journalism 2: cleaning, combining, communicatingData Journalism 2: cleaning, combining, communicating
Data Journalism 2: cleaning, combining, communicating
Paul Bradshaw
 
Data Journalism (very abridged)
Data Journalism (very abridged)Data Journalism (very abridged)
Data Journalism (very abridged)
Paul Bradshaw
 
Searching does not mean finding Stuff - Apache Solr for TYPO3
Searching does not mean finding Stuff - Apache Solr for TYPO3Searching does not mean finding Stuff - Apache Solr for TYPO3
Searching does not mean finding Stuff - Apache Solr for TYPO3
Olivier Dobberkau
 
Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020
Philip Bourne
 
Android Development Slides
Android Development SlidesAndroid Development Slides
Android Development Slides
Victor Miclovich
 
Messaging patterns
Messaging patternsMessaging patterns
Messaging patterns
Alvaro Videla
 
Choosing the right Content Management System
Choosing the right Content Management SystemChoosing the right Content Management System
Choosing the right Content Management System
Rachel Andrew
 
Data Driven Innovation
Data Driven InnovationData Driven Innovation
Data Driven Innovation
ideas.org
 
Data Driven Innovation
Data Driven InnovationData Driven Innovation
Data Driven Innovation
Simon Grice
 
IAT334-Lec02-TaskAnalysis.pptx
IAT334-Lec02-TaskAnalysis.pptxIAT334-Lec02-TaskAnalysis.pptx
IAT334-Lec02-TaskAnalysis.pptx
ssuseraae9cd
 
Mobility in the financial industry
Mobility in the financial industryMobility in the financial industry
Mobility in the financial industry
Vincent Everts
 
How to Make Entities and Influence Drupal - Emerging Patterns from Drupal Con...
How to Make Entities and Influence Drupal - Emerging Patterns from Drupal Con...How to Make Entities and Influence Drupal - Emerging Patterns from Drupal Con...
How to Make Entities and Influence Drupal - Emerging Patterns from Drupal Con...
Ronald Ashri
 
Reasoning over big data
Reasoning over big dataReasoning over big data
Reasoning over big data
OSTHUS
 
Mahout classifier tour
Mahout classifier tourMahout classifier tour
Mahout classifier tour
MapR Technologies
 
Ufi Keynote 10 Feb
Ufi Keynote 10 FebUfi Keynote 10 Feb
Ufi Keynote 10 Feb
Ronnie Overgoor
 
"The Reality of Digital Science"
"The Reality of Digital Science""The Reality of Digital Science"
"The Reality of Digital Science"
Kaitlin Thaney
 
ITP / SED Day 2
ITP / SED Day 2ITP / SED Day 2
ITP / SED Day 2
Sami Niemelä
 
Koss, How to make desktop caliber browser apps
Koss, How to make desktop caliber browser appsKoss, How to make desktop caliber browser apps
Koss, How to make desktop caliber browser apps
Evil Martians
 
STI Summit 2011 - Linked services
STI Summit 2011 - Linked servicesSTI Summit 2011 - Linked services
STI Summit 2011 - Linked services
Semantic Technology Institute International
 
Atlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide Deck
Atlassian
 

Similar to Data Journalism 2: Interrogating, Visualising and Mashing (20)

Data Journalism 2: cleaning, combining, communicating
Data Journalism 2: cleaning, combining, communicatingData Journalism 2: cleaning, combining, communicating
Data Journalism 2: cleaning, combining, communicating
 
Data Journalism (very abridged)
Data Journalism (very abridged)Data Journalism (very abridged)
Data Journalism (very abridged)
 
Searching does not mean finding Stuff - Apache Solr for TYPO3
Searching does not mean finding Stuff - Apache Solr for TYPO3Searching does not mean finding Stuff - Apache Solr for TYPO3
Searching does not mean finding Stuff - Apache Solr for TYPO3
 
Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020Open Data Driven Scholarly Communication in 2020
Open Data Driven Scholarly Communication in 2020
 
Android Development Slides
Android Development SlidesAndroid Development Slides
Android Development Slides
 
Messaging patterns
Messaging patternsMessaging patterns
Messaging patterns
 
Choosing the right Content Management System
Choosing the right Content Management SystemChoosing the right Content Management System
Choosing the right Content Management System
 
Data Driven Innovation
Data Driven InnovationData Driven Innovation
Data Driven Innovation
 
Data Driven Innovation
Data Driven InnovationData Driven Innovation
Data Driven Innovation
 
IAT334-Lec02-TaskAnalysis.pptx
IAT334-Lec02-TaskAnalysis.pptxIAT334-Lec02-TaskAnalysis.pptx
IAT334-Lec02-TaskAnalysis.pptx
 
Mobility in the financial industry
Mobility in the financial industryMobility in the financial industry
Mobility in the financial industry
 
How to Make Entities and Influence Drupal - Emerging Patterns from Drupal Con...
How to Make Entities and Influence Drupal - Emerging Patterns from Drupal Con...How to Make Entities and Influence Drupal - Emerging Patterns from Drupal Con...
How to Make Entities and Influence Drupal - Emerging Patterns from Drupal Con...
 
Reasoning over big data
Reasoning over big dataReasoning over big data
Reasoning over big data
 
Mahout classifier tour
Mahout classifier tourMahout classifier tour
Mahout classifier tour
 
Ufi Keynote 10 Feb
Ufi Keynote 10 FebUfi Keynote 10 Feb
Ufi Keynote 10 Feb
 
"The Reality of Digital Science"
"The Reality of Digital Science""The Reality of Digital Science"
"The Reality of Digital Science"
 
ITP / SED Day 2
ITP / SED Day 2ITP / SED Day 2
ITP / SED Day 2
 
Koss, How to make desktop caliber browser apps
Koss, How to make desktop caliber browser appsKoss, How to make desktop caliber browser apps
Koss, How to make desktop caliber browser apps
 
STI Summit 2011 - Linked services
STI Summit 2011 - Linked servicesSTI Summit 2011 - Linked services
STI Summit 2011 - Linked services
 
Atlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide Deck
 

More from Paul Bradshaw

Telling factual stories in virtual reality, 360 degree video and augmented re...
Telling factual stories in virtual reality, 360 degree video and augmented re...Telling factual stories in virtual reality, 360 degree video and augmented re...
Telling factual stories in virtual reality, 360 degree video and augmented re...
Paul Bradshaw
 
How to work with a bullshitting robot
How to work with a bullshitting robotHow to work with a bullshitting robot
How to work with a bullshitting robot
Paul Bradshaw
 
How to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in RHow to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in R
Paul Bradshaw
 
ChatGPT (and generative AI) in journalism
ChatGPT (and generative AI) in journalismChatGPT (and generative AI) in journalism
ChatGPT (and generative AI) in journalism
Paul Bradshaw
 
Data journalism: history and roles
Data journalism: history and rolesData journalism: history and roles
Data journalism: history and roles
Paul Bradshaw
 
Working on data stories: different approaches
Working on data stories: different approachesWorking on data stories: different approaches
Working on data stories: different approaches
Paul Bradshaw
 
Visual journalism: gifs, emoji, memes and other techniques
Visual journalism: gifs, emoji, memes and other techniquesVisual journalism: gifs, emoji, memes and other techniques
Visual journalism: gifs, emoji, memes and other techniques
Paul Bradshaw
 
Using narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalismUsing narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalism
Paul Bradshaw
 
Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)
Paul Bradshaw
 
Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)
Paul Bradshaw
 
Data journalism on the air: 3 tips
Data journalism on the air: 3 tipsData journalism on the air: 3 tips
Data journalism on the air: 3 tips
Paul Bradshaw
 
7 angles for data stories
7 angles for data stories7 angles for data stories
7 angles for data stories
Paul Bradshaw
 
Uncertain times, stories of uncertainty
Uncertain times, stories of uncertaintyUncertain times, stories of uncertainty
Uncertain times, stories of uncertainty
Paul Bradshaw
 
Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)
Paul Bradshaw
 
Storytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reportingStorytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reporting
Paul Bradshaw
 
Cognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalistsCognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalists
Paul Bradshaw
 
The 3 chords of data journalism
The 3 chords of data journalismThe 3 chords of data journalism
The 3 chords of data journalism
Paul Bradshaw
 
Data journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for storiesData journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for stories
Paul Bradshaw
 
Teaching AI in data journalism
Teaching AI in data journalismTeaching AI in data journalism
Teaching AI in data journalism
Paul Bradshaw
 
10 ways AI can be used for investigations
10 ways AI can be used for investigations10 ways AI can be used for investigations
10 ways AI can be used for investigations
Paul Bradshaw
 

More from Paul Bradshaw (20)

Telling factual stories in virtual reality, 360 degree video and augmented re...
Telling factual stories in virtual reality, 360 degree video and augmented re...Telling factual stories in virtual reality, 360 degree video and augmented re...
Telling factual stories in virtual reality, 360 degree video and augmented re...
 
How to work with a bullshitting robot
How to work with a bullshitting robotHow to work with a bullshitting robot
How to work with a bullshitting robot
 
How to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in RHow to generate a 100+ page website using parameterisation in R
How to generate a 100+ page website using parameterisation in R
 
ChatGPT (and generative AI) in journalism
ChatGPT (and generative AI) in journalismChatGPT (and generative AI) in journalism
ChatGPT (and generative AI) in journalism
 
Data journalism: history and roles
Data journalism: history and rolesData journalism: history and roles
Data journalism: history and roles
 
Working on data stories: different approaches
Working on data stories: different approachesWorking on data stories: different approaches
Working on data stories: different approaches
 
Visual journalism: gifs, emoji, memes and other techniques
Visual journalism: gifs, emoji, memes and other techniquesVisual journalism: gifs, emoji, memes and other techniques
Visual journalism: gifs, emoji, memes and other techniques
 
Using narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalismUsing narrative structures in shortform and longform journalism
Using narrative structures in shortform and longform journalism
 
Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)Narrative and multiplatform journalism (part 1)
Narrative and multiplatform journalism (part 1)
 
Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)Teaching data journalism (Abraji 2021)
Teaching data journalism (Abraji 2021)
 
Data journalism on the air: 3 tips
Data journalism on the air: 3 tipsData journalism on the air: 3 tips
Data journalism on the air: 3 tips
 
7 angles for data stories
7 angles for data stories7 angles for data stories
7 angles for data stories
 
Uncertain times, stories of uncertainty
Uncertain times, stories of uncertaintyUncertain times, stories of uncertainty
Uncertain times, stories of uncertainty
 
Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)Ergodic education (online teaching and interactivity)
Ergodic education (online teaching and interactivity)
 
Storytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reportingStorytelling in the database era: uncertainty and science reporting
Storytelling in the database era: uncertainty and science reporting
 
Cognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalistsCognitive bias: a quick guide for journalists
Cognitive bias: a quick guide for journalists
 
The 3 chords of data journalism
The 3 chords of data journalismThe 3 chords of data journalism
The 3 chords of data journalism
 
Data journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for storiesData journalism: what it is, how to use data for stories
Data journalism: what it is, how to use data for stories
 
Teaching AI in data journalism
Teaching AI in data journalismTeaching AI in data journalism
Teaching AI in data journalism
 
10 ways AI can be used for investigations
10 ways AI can be used for investigations10 ways AI can be used for investigations
10 ways AI can be used for investigations
 

Recently uploaded

What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 

Recently uploaded (20)

What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 

Data Journalism 2: Interrogating, Visualising and Mashing

  • 1. Data 2: Interrogating, visualising, mashing Online Journalism City University Paul Bradshaw Monday, 7 March 2011
  • 2. Themes 5 things you need to know about each Data journalism in action Walkthrough Monday, 7 March 2011
  • 3. Interrogating data . Monday, 7 March 2011
  • 5. 5 things you need to know about interrogating data 1. Data always needs cleaning up 2. Treat the ‘source’ like a source 3. Use the right ‘average’ and percentage 4. Variation over time & space: context 5. Spreadsheet tools are your friend - but always backup copies Monday, 7 March 2011
  • 7. “What the Independent have done is confuse the UK’s deficit with our debt [making] the debt problem look around eight times worse than it is. And it used the whole of its front page to do so.” - James Ball Monday, 7 March 2011
  • 9. What is the data worth? Measurement doesn't answer anything if there's only one variable Statistical significance Sample size and selection Controls and the placebo effect Read up. Monday, 7 March 2011
  • 10. 1. Variance is interesting. 2. Variance is different for different variables and in different populations. 3. The amount of variance is easily quantified. - Philip Meyer, Precision Journalism Monday, 7 March 2011
  • 11. Getting data in the right form Data > Text to columns Find & replace Conditional formulas: =IF(condition, if met, if not) =COUNTIF(range, test) Monday, 7 March 2011
  • 12. Walkthrough: cleaning data in Google Refine Edit cells > common transforms Edit cells > split multi-valued cells Facet > text facet Export... Monday, 7 March 2011
  • 13. Visualising data . Monday, 7 March 2011
  • 14. 5 things you need to know about visualising data 1. Choose the chart for the purpose 2. It can be used to spot a lead 3. Good design is when there’s nothing more to take away 4. It should be self-contained & have refs 5. Be careful with scales and classes Monday, 7 March 2011
  • 18. What is wrong with this picture? Monday, 7 March 2011
  • 22. Visualisation tools ManyEyes Tableau Wordle, Tagxedo BatchGeo Gephi Delicious.com/paulb/visualisation+tools Monday, 7 March 2011
  • 23. Walkthrough: visualising data with Google Gadgets . Monday, 7 March 2011
  • 24. Walkthrough: visualising data in ManyEyes . Monday, 7 March 2011
  • 25. Mashing data . Monday, 7 March 2011
  • 26. 5 things you need to know about mashing data 1. It is what a journalist does best 2. Look for a point of connection: place? Person? Company? Date? 3. What an API can do 4. What APIs there are 5. Mashups can be live, updated or static Monday, 7 March 2011
  • 29. Mashup tools Yahoo! Pipes OpenHeatMap Mapalist xFruits Scraperwiki Maptube Monday, 7 March 2011
  • 30. Walkthrough: making mashups with Yahoo! Pipes Inputs - Fetch Feed, CSV, Data, Page, YQL, Flickr, Form Operators - Filter, Sort, Unique, Union, Count, Split, Rename, Regex, Unique, Location extractor, URL Builder Outputs - Map, Gallery, List, XML, KML Monday, 7 March 2011
  • 31. Walkthrough: making mashups with OpenHeatMap Format the spreadsheet Publish it as CSV Copy link Paste it at OpenHeatMap Fix any problems Monday, 7 March 2011
  • 32. Walkthrough: grabbing geo data with Google Refine Edit column > Add column by fetching URLs Use GREL (Google Refine Expression Language) Search web for help & examples Monday, 7 March 2011
  • 33. Questions? . Monday, 7 March 2011
  • 34. Links OnlineJournalismClasses.tumblr.com Delicious.com/paulb/cityoj09 Delicious.com/paulb/datajournalism Delicious.com/paulb/visualisation Delicious.com/paulb/statistics Delicious.com/paulb/mashups Monday, 7 March 2011
  • 35. Lab Before the lab: play with these techniques yourself, have problems, find solutions, raise questions. Install Google Refine and Tableau on your laptop to use. - Visualise, interrogate or mash data Monday, 7 March 2011
  • 36. Books Kaiser Fung - Numbers Rule Your World Ben Goldacre - Bad Science Donna Wong - The WSJ Guide to Information Graphics Brian Suda - A Practical Guide to Designing with Data Monday, 7 March 2011