Successfully reported this slideshow.
Upcoming SlideShare
×

# Visualizing Data Journalism (HasGeek Fifth Elephant)

1,139 views

Published on

The presentation is broken into two parts. First, it introduces the various core fundamentals of data visualization and then we apply those fundamentals in two case studies. The second part revolves around challenges with data journalism and what is pykih doing about them.

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Visualizing Data Journalism (HasGeek Fifth Elephant)

1. 1. Visualizing Data Journalism Ritvvij Parrikh, Founder, www.pykih.com ! ! Fifth Elephant, Delhi Run-up Event, India Today Mediaplex, June 14, 2014
2. 2. Pykih is a data Visualization company. We build custom visual representations of large data sets to make data actionable for readers. We have satisﬁed customers in six countries. Introduction
3. 3. • Data Viz. • Theory • Case Study 1 • Case Study 2 • Summary • Challenges in Data Journalism • What we are doing about it for ourselves Agenda
4. 4. Data Visualization
5. 5. Let’s explore the humble pie chart… Party Percentage E 38% D 25% C 20% B 15% A 2% Break the whole into parts.
6. 6. Let’s explore the humble pie chart… Party Percentage E 38% D 25% C 20% B 15% A 2% Break the whole into parts. Data: One dimensional Visual Encoding: Area
7. 7. New Terms • Dimension: Columns by which you group data.! ! • Facts: Numbers that you can count, sum, average, etc.! ! • Examples:! • Seat count by party! • Seat count by party and state! ! • Visual Encoding: Area, Position, Colour, Length, Thickness, etc.
8. 8. One-dimensional Charts PIE is a one-dimensional chart
9. 9. One-dimensional Charts … A pie could have been a random shape broken by percentage
10. 10. One-dimensional Charts … Pie Amoeba Percentage! Rectangle Donut Percentage! Triangle Bubble Election Donut Funnel Percentage Bar Percentage ! Column #1 - The same data can be Visualized in many (MANY!) different ways.
11. 11. One-dimensional Charts … Source: thehindu.com What is wrong here?
12. 12. One-dimensional Charts … What is wrong here? Problems:! • Colour communicates no data! • 3D communicates no data Source: thehindu.com
13. 13. One-dimensional Charts … Source: thehindu.com #2 - Your goal is to communicate data. Wrong use of visual encoding confuses. Problems:! • Colour communicates no data! • 3D communicates no data
14. 14. One-dimensional Charts … Source: ﬁrstpost.com What is wrong here?
15. 15. One-dimensional Charts … What is wrong here? Problems:! • Colour! • Too many values. Too cluttered. Source: ﬁrstpost.com
16. 16. One-dimensional Charts … Problems:! • Colour! • Too many values. Too cluttered. #3 - AREA encoding is useful for only few values after which it is unreadable. Source: ﬁrstpost.com
17. 17. One-dimensional Charts … Solution to problem of restricted space? Create a custom chart.
18. 18. New Data Set One dimensional: ! Seat count by party Grouped One dimensional: ! Seat count by party grouped by alliance
19. 19. Grouped One-dimensional Charts Party Alliance Percentage A NDA 38% B NDA 25% C NDA 20% D UPA 15% E Others 2%
20. 20. Grouped One-dimensional Charts Group various bubbles by colours Party Alliance Percentage A NDA 38% B NDA 25% C NDA 20% D UPA 15% E Others 2%
21. 21. Grouped One-dimensional Charts Group various bubbles by colours Party Alliance Percentage A NDA 38% B NDA 25% C NDA 20% D UPA 15% E Others 2% #4 - You can always ﬁt in an extra dimension (GROUP) in charts using colour.
22. 22. New Data Set One dimensional: ! Seat count by party Grouped One dimensional: ! Seat count by party grouped by alliance Two dimensional: ! Which party won in which year
23. 23. Two-dimensional Charts Plot two data points Party Constituency A Z B Y C X D V E W 23Visual encoding: Position, Length
24. 24. Two-dimensional Charts… Connect the dots and you get a line chart.
25. 25. Two-dimensional Charts… Scatter Line Area Bar Column Spider All these charts require the same data.#5 - Number of dimensions in data determines which chart to use
26. 26. New Data Set One dimensional: ! Seat count by party Grouped One dimensional: ! Seat count by party grouped by alliance Two dimensional: ! Which party won in which constituency Weighted Two dimensional: ! Which party won in which constituency by what vote margin
27. 27. Weighted Two-dimensional Charts This is a 2d chart.
28. 28. Weighted Two-dimensional Charts … Let’s add weight to it, hence now we have three data points X axis Y axis Weight A Z 40 B Y 20 C X 1 D V 300 E W 60 28Visual encoding: Position, Length, Area
29. 29. Weighted Two-dimensional Charts … Weighted Scatter Circle Comparison All these charts require the same data.#6 - You can always ﬁt in an extra fact (WEIGHT) in charts using size.
30. 30. New Data Set One dimensional: ! Seat count by party Grouped One dimensional: ! Seat count by party grouped by alliance Two dimensional: ! Which party won in which constituency Weighted Two dimensional: ! Which party won in which constituency by what vote margin Grouped Weighted Two dimensional: ! Which party won in which constituency by what vote margin grouped by alliance
31. 31. Grouped Weighted Two-dimensional Charts Grouped Weighted Scatter Grouped Circle Comparison 31Visual encoding: Position, Length, Area, Colour
32. 32. Multi-series Two-dimensional Charts … RangeGanttMulti-series Line Group Column Stack Column Group Stack Column Stack Area Stack Percentage Area Add more dimensions in creative ways.
33. 33. Multi-series Two-dimensional Charts … What is right and wrong here? Source: livemint.com Is the equities rally percolating into the broader market?
34. 34. Multi-series Two-dimensional Charts … What is right and wrong here? Source: livemint.com Is the equities rally percolating into the broader market? Bad parts:! • BSE Small-cap lines is not visible and that’s the story.
35. 35. Multi-series Two-dimensional Charts … What is right and wrong here? Good parts:! • Y axis from 97 instead of 0 Source: livemint.com Is the equities rally percolating into the broader market? Bad parts:! • BSE Small-cap lines is not visible and that’s the story. #7 - Purpose of line chart is to show trend. Focus on it.
36. 36. Multi-series Two-dimensional Charts … What is wrong here? Source: livemint.com Does IMF wear rose-tinted glasses?
37. 37. Multi-series Two-dimensional Charts … What is wrong here? Source: livemint.com Problems:! • Cannot ﬁnd the IMF line. Does IMF wear rose-tinted glasses?
38. 38. Multi-series Two-dimensional Charts … What is wrong here? Source: livemint.com Does IMF wear rose-tinted glasses? Problems:! • Cannot ﬁnd the IMF line. #8 - Highlight the story for the user. Use color to highlight, not confuse.
39. 39. New Data Set All the data we encountered so far was RDBMS i.e. could ﬁt in a SpreadSheet. (rows and columns). ! ! Sometimes data is more complex. It can have“relationships”. ! ! Types of relationships:! • Hierarchy / Tree! • Multi-level relationships
40. 40. Tree Charts { "name": "root", "children": [ { "name": "A", "children": [ {"name": "A1"}, {"name": "A2"}, {"name": "A3"}, {"name": "A4"} ] 40Visual encoding: Position
41. 41. Tree Charts Dendrogram Circular Dendrogram
42. 42. Grouped Weighted Tree Charts Packed Circle Sunburst Tree Rectangle Tree Bar Grouped Weighted Tree 42Visual encoding: Position, Size, Colour
43. 43. Grouped Weighted Tree Charts Sunburst 43Visual encoding: Position, Size, Colour
44. 44. Grouped Multi-level Relationship Charts { “nodes”: [ {“name”: “A”, “group”: “G1”}, {“name”: “B”, “group”: “G2”}, … ], "relations": [ {"from": “A”, "to": “B”}, {"from": “A”, "to": “C”}, … ] 44Visual encoding: Position
45. 45. Grouped Multi-level Relationship Charts Graph Collapsible Graph Hive #9 - Look for relationships across data sets.
46. 46. Weighted Grouped Multi-level Relationship Charts Sankey 46Visual encoding: Position, Color, Size
47. 47. Case: Mumbai Local Fare Chart A fare exists for travel between station "A" and “B”. Hence, it is a relationship chart.
48. 48. Case: Mumbai Local Fare Chart Matrix Half Matrix [ {"node1": "A", "node2": "B", "weight": 300}, {"node1": "A", "node2": "C", "weight": 900}, … ]
49. 49. Case: Mumbai Local Fare Chart 49 #9 - Look for limitations. They can help you improve design.
50. 50. Weighted Two-level Relationship Charts … Chord Number of people travel between various stations
51. 51. • One dimensional charts! • Grouped one dimensional charts! ! • Two dimensional charts! • Weighted Two dimensional charts! • Grouped Two dimensional charts! • Grouped Weighted Two dimensional charts! ! • Multi-dimensional Charts! ! • Tree Charts! • Grouped Weighted Tree Charts! ! • Multi-level Relationships Charts! • Grouped Weighted Multi-level Relationships Charts! ! • Two-level Relationships Charts! • Grouped Weighted Two-level Relationships Charts Taxonomy of Standard Data Visualizations
52. 52. The same data can be visualized in many (MANY!) ways. Without exploring the data, you will end up visualizing all your data in pies, lines and bars. Most Imp. Lesson
53. 53. One Dimension Two Dimension Multi- Dimension Relationship Hierarchical Geo Maps Dimension: Time N Y Y N N N Dimension: Group Y Y Y Y Y Y Fact: Weight N Y Y Y Y Y Group and Weight N Y Y Y Y N Fact: Many values May be Y Y Y Y Y Multiple levels / Zoomable N N N Y Y Y Implications
54. 54. List of Visual Encodings Source: http://complexdiagrams.com/properties
55. 55. Case Study #1: Let’s apply what we learnt IPL Score Card
56. 56. ESPNCricInfo Score Card 56
57. 57. 57 Ball by ball! Commentary Per Batsman Statistics Per Bowler Statistics Fall of Wickets Partnerships Two innings Pre-match: Toss, Playing 11, Location, Time Post-match: Win, by how much, Man of the match Second Innings: Current Run Rate, Required Run Rate, Target score
58. 58. Overs: Most important data-point 1. Overs = Time! 2. One over ! 1. has_many balls! 2. has_one bowler! 3. has_many batsmen! 3. Existence of batsmen across overs is partnerships! 4. Partnerships and Fall of wickets are the same different data set
59. 59. Ball by ball Commentary
60. 60. Partnerships
61. 61. Combine the two Weighted two-dimensional chart Y-axis: Balls per over X-axis: Overs + Bowlers Gantt chart Y-axis: Batsmen X-axis: Overs + Bowlers All other “zoomable" information is shown via interactions
62. 62. Putting it all together
63. 63. Let’s see it live http://www.ﬁrstpost.com/cricket-live-score/IPL/1-jun-2014- kolkata-knight-riders-versus-kings-xi-punjab/2173/175977
64. 64. Less reading. No scrolling. More awareness.
65. 65. Case Study #2: Let’s apply what we learnt Election Counting Day
66. 66. Election Counting Day Data Set:! • India has 50+ regional parties and two national parties.! • During Election Counting Day (live), seats are either “Leading” or “Won”! ! Data Properties / Relationships:! • Hierarchical Relation between Alliance and Party! • Won is conﬁrmed. Leading is transient.! ! What did readers want to know this Election:! • How badly would UPA lose! • How big would be the BJP victory! • How big would the impact of AAP would be! ! Real world facts to inspire design! • BJP is a right wing party! • AAP is left most followed by UPA! • The Sansad Hall is a semi-circle
67. 67. Election Counting Day Data Set:! • India has 50+ regional parties and two national parties.! • During Election Counting Day (live), seats are either “Leading” or “Won”! ! Data Properties / Relationships:! • Hierarchical Relation between Alliance and Party! • Won is conﬁrmed. Leading is transient.! ! What did readers want to know this Election:! • How badly would UPA lose! • How big would be the BJP victory! • How big would the impact of AAP would be! ! Real world facts to inspire design:! • BJP is a right wing party! • AAP is left most followed by UPA! • The Sansad Hall is a semi-circle —> Group —> Tree —> Weight —> Limitation } Hence, all other parties! can be clubbed into ! other —> Shape —> Placement}
68. 68. Choosing the right Grouped Weighted Tree Chart Packed Circle Sunburst Tree Rectangle Tree Bar Grouped Weighted Tree 68Visual encoding: Position, Size, Colour
69. 69. Election Counting Day … Sunburst
71. 71. Sansad Chart Focus on what is most imp.! Alliance is more imp. than Party. We spent 200% more time reversing hierarchy
72. 72. Let’s see it live http://ﬁrstpost.com/election-results
73. 73. Summary 1. Study properties and relationships of your Data Set! 2. Use your visual encodings wisely
74. 74. Challenges in Data Journalism
75. 75. Data Collection What’s the story Visualize Story Journalist Developer Designer • Govt. data! • APIs! • Scrape! • Mine web! • PDFs • Clean the data! • Model the data! • Investigate • Design! • Build Write Technology is an integral part of data journalism. Steps in data journalism
76. 76. Data Driven Stories Visualization App Day-to-day short stories derived from data Big apps. to educate large and important event e.g. budget, election, etc. Formats in data journalism
77. 77. Format #1 - Data Driven Stories Source: http://factchecker.in/data-are-crimes-against-scheduled-castes-on-an-upswing-in-india/ Badaun Case —> Find legit Data —> Analyse —> Plot —> Story
78. 78. Format #2 - Visualization Apps
79. 79. Data Collection What’s the story Visualize Story • Govt. data! • APIs! • Scrape! • Mine web! • PDFs • Clean the data! • Model the data! • Investigate • Design! • Build Write Format: Visualization app Format: Data Driven Stories Journalist Developer Designer Journalist Implication
80. 80. High Level ! 1. Quick access to appropriate data set 2. Quick analysis of this data 3. Consistently churn out neat charts, graphs and maps Challenges
81. 81. High Level ! 1. Quick access to appropriate data set 2. Quick analysis of this data 3. Consistently churn out neat charts, graphs and maps ! Technical ! 1. Live Data Modelling 2. SEO 3. How to handle high traﬃc Challenges
82. 82. High Level ! 1. Quick access to appropriate data set 2. Quick analysis of this data 3. Consistently churn out neat charts, graphs and maps ! Technical ! 1. Live Data Modelling 2. SEO 3. How to handle high traﬃc ! From pykih perspective ! 1. How do you consistently build beautiful, real-time Visualizations? Challenges
83. 83. What we are doing about it In-house tool called "Backstage"
84. 84. #1 - Instead of waiting for data to be standardised, we want to make large scale, high- velocity, multi-format, data extraction durable. ! #2 - Instead of expecting data-users / journalists to have analytical skills, we are: • simplifying exploration of large data sets • automating extraction of metadata from data sets • simplifying assisted data standardisation • building tools for assisted analysis ! #3 - Instead of expecting data-users / journalists to Visualize data correctly, we are attempting automate meta-data driven Visualization ! Other Experiments • A data-driven blogging software • Conﬁguration Editor Principles —> Demo the worker —> Demo the census dashboard —> ISO example —> Demo NLP based Date Standardiser —> Story is in the outliers Example: If data is ordinal then colour automatically leverages saturation and if data is ordinal then colour is distinct
85. 85. Data Visualization company => Data and Visualization company ! ! Eﬀective Data Journalism leverages: You will end up NoSQL, Memory based databases, NLP, OLAP modelling, Free Text Search, Statistics, etc. Summary
86. 86. We are at @pykih Fun fact: The word pykih came to us in a CAPTCHA. That’s the day we decided that till we do good work it does not matter what we are called.