Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

792 views

711 views

711 views

Published on

No Downloads

Total views

792

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

0

Comments

0

Likes

4

No embeds

No notes for slide

- 1. Visualizing Data Journalism Ritvvij Parrikh, Founder, www.pykih.com ! ! Fifth Elephant, Delhi Run-up Event, India Today Mediaplex, June 14, 2014
- 2. Pykih is a data Visualization company. We build custom visual representations of large data sets to make data actionable for readers. We have satisﬁed customers in six countries. Introduction
- 3. • Data Viz. • Theory • Case Study 1 • Case Study 2 • Summary • Challenges in Data Journalism • What we are doing about it for ourselves Agenda
- 4. Data Visualization
- 5. Let’s explore the humble pie chart… Party Percentage E 38% D 25% C 20% B 15% A 2% Break the whole into parts.
- 6. Let’s explore the humble pie chart… Party Percentage E 38% D 25% C 20% B 15% A 2% Break the whole into parts. Data: One dimensional Visual Encoding: Area
- 7. New Terms • Dimension: Columns by which you group data.! ! • Facts: Numbers that you can count, sum, average, etc.! ! • Examples:! • Seat count by party! • Seat count by party and state! ! • Visual Encoding: Area, Position, Colour, Length, Thickness, etc.
- 8. One-dimensional Charts PIE is a one-dimensional chart
- 9. One-dimensional Charts … A pie could have been a random shape broken by percentage
- 10. One-dimensional Charts … Pie Amoeba Percentage! Rectangle Donut Percentage! Triangle Bubble Election Donut Funnel Percentage Bar Percentage ! Column #1 - The same data can be Visualized in many (MANY!) different ways.
- 11. One-dimensional Charts … Source: thehindu.com What is wrong here?
- 12. One-dimensional Charts … What is wrong here? Problems:! • Colour communicates no data! • 3D communicates no data Source: thehindu.com
- 13. One-dimensional Charts … Source: thehindu.com #2 - Your goal is to communicate data. Wrong use of visual encoding confuses. Problems:! • Colour communicates no data! • 3D communicates no data
- 14. One-dimensional Charts … Source: ﬁrstpost.com What is wrong here?
- 15. One-dimensional Charts … What is wrong here? Problems:! • Colour! • Too many values. Too cluttered. Source: ﬁrstpost.com
- 16. One-dimensional Charts … Problems:! • Colour! • Too many values. Too cluttered. #3 - AREA encoding is useful for only few values after which it is unreadable. Source: ﬁrstpost.com
- 17. One-dimensional Charts … Solution to problem of restricted space? Create a custom chart.
- 18. New Data Set One dimensional: ! Seat count by party Grouped One dimensional: ! Seat count by party grouped by alliance
- 19. Grouped One-dimensional Charts Party Alliance Percentage A NDA 38% B NDA 25% C NDA 20% D UPA 15% E Others 2%
- 20. Grouped One-dimensional Charts Group various bubbles by colours Party Alliance Percentage A NDA 38% B NDA 25% C NDA 20% D UPA 15% E Others 2%
- 21. Grouped One-dimensional Charts Group various bubbles by colours Party Alliance Percentage A NDA 38% B NDA 25% C NDA 20% D UPA 15% E Others 2% #4 - You can always ﬁt in an extra dimension (GROUP) in charts using colour.
- 22. New Data Set One dimensional: ! Seat count by party Grouped One dimensional: ! Seat count by party grouped by alliance Two dimensional: ! Which party won in which year
- 23. Two-dimensional Charts Plot two data points Party Constituency A Z B Y C X D V E W 23Visual encoding: Position, Length
- 24. Two-dimensional Charts… Connect the dots and you get a line chart.
- 25. Two-dimensional Charts… Scatter Line Area Bar Column Spider All these charts require the same data.#5 - Number of dimensions in data determines which chart to use
- 26. New Data Set One dimensional: ! Seat count by party Grouped One dimensional: ! Seat count by party grouped by alliance Two dimensional: ! Which party won in which constituency Weighted Two dimensional: ! Which party won in which constituency by what vote margin
- 27. Weighted Two-dimensional Charts This is a 2d chart.
- 28. Weighted Two-dimensional Charts … Let’s add weight to it, hence now we have three data points X axis Y axis Weight A Z 40 B Y 20 C X 1 D V 300 E W 60 28Visual encoding: Position, Length, Area
- 29. Weighted Two-dimensional Charts … Weighted Scatter Circle Comparison All these charts require the same data.#6 - You can always ﬁt in an extra fact (WEIGHT) in charts using size.
- 30. New Data Set One dimensional: ! Seat count by party Grouped One dimensional: ! Seat count by party grouped by alliance Two dimensional: ! Which party won in which constituency Weighted Two dimensional: ! Which party won in which constituency by what vote margin Grouped Weighted Two dimensional: ! Which party won in which constituency by what vote margin grouped by alliance
- 31. Grouped Weighted Two-dimensional Charts Grouped Weighted Scatter Grouped Circle Comparison 31Visual encoding: Position, Length, Area, Colour
- 32. Multi-series Two-dimensional Charts … RangeGanttMulti-series Line Group Column Stack Column Group Stack Column Stack Area Stack Percentage Area Add more dimensions in creative ways.
- 33. Multi-series Two-dimensional Charts … What is right and wrong here? Source: livemint.com Is the equities rally percolating into the broader market?
- 34. Multi-series Two-dimensional Charts … What is right and wrong here? Source: livemint.com Is the equities rally percolating into the broader market? Bad parts:! • BSE Small-cap lines is not visible and that’s the story.
- 35. Multi-series Two-dimensional Charts … What is right and wrong here? Good parts:! • Y axis from 97 instead of 0 Source: livemint.com Is the equities rally percolating into the broader market? Bad parts:! • BSE Small-cap lines is not visible and that’s the story. #7 - Purpose of line chart is to show trend. Focus on it.
- 36. Multi-series Two-dimensional Charts … What is wrong here? Source: livemint.com Does IMF wear rose-tinted glasses?
- 37. Multi-series Two-dimensional Charts … What is wrong here? Source: livemint.com Problems:! • Cannot ﬁnd the IMF line. Does IMF wear rose-tinted glasses?
- 38. Multi-series Two-dimensional Charts … What is wrong here? Source: livemint.com Does IMF wear rose-tinted glasses? Problems:! • Cannot ﬁnd the IMF line. #8 - Highlight the story for the user. Use color to highlight, not confuse.
- 39. New Data Set All the data we encountered so far was RDBMS i.e. could ﬁt in a SpreadSheet. (rows and columns). ! ! Sometimes data is more complex. It can have“relationships”. ! ! Types of relationships:! • Hierarchy / Tree! • Multi-level relationships
- 40. Tree Charts { "name": "root", "children": [ { "name": "A", "children": [ {"name": "A1"}, {"name": "A2"}, {"name": "A3"}, {"name": "A4"} ] 40Visual encoding: Position
- 41. Tree Charts Dendrogram Circular Dendrogram
- 42. Grouped Weighted Tree Charts Packed Circle Sunburst Tree Rectangle Tree Bar Grouped Weighted Tree 42Visual encoding: Position, Size, Colour
- 43. Grouped Weighted Tree Charts Sunburst 43Visual encoding: Position, Size, Colour
- 44. Grouped Multi-level Relationship Charts { “nodes”: [ {“name”: “A”, “group”: “G1”}, {“name”: “B”, “group”: “G2”}, … ], "relations": [ {"from": “A”, "to": “B”}, {"from": “A”, "to": “C”}, … ] 44Visual encoding: Position
- 45. Grouped Multi-level Relationship Charts Graph Collapsible Graph Hive #9 - Look for relationships across data sets.
- 46. Weighted Grouped Multi-level Relationship Charts Sankey 46Visual encoding: Position, Color, Size
- 47. Case: Mumbai Local Fare Chart A fare exists for travel between station "A" and “B”. Hence, it is a relationship chart.
- 48. Case: Mumbai Local Fare Chart Matrix Half Matrix [ {"node1": "A", "node2": "B", "weight": 300}, {"node1": "A", "node2": "C", "weight": 900}, … ]
- 49. Case: Mumbai Local Fare Chart 49 #9 - Look for limitations. They can help you improve design.
- 50. Weighted Two-level Relationship Charts … Chord Number of people travel between various stations
- 51. • One dimensional charts! • Grouped one dimensional charts! ! • Two dimensional charts! • Weighted Two dimensional charts! • Grouped Two dimensional charts! • Grouped Weighted Two dimensional charts! ! • Multi-dimensional Charts! ! • Tree Charts! • Grouped Weighted Tree Charts! ! • Multi-level Relationships Charts! • Grouped Weighted Multi-level Relationships Charts! ! • Two-level Relationships Charts! • Grouped Weighted Two-level Relationships Charts Taxonomy of Standard Data Visualizations
- 52. The same data can be visualized in many (MANY!) ways. Without exploring the data, you will end up visualizing all your data in pies, lines and bars. Most Imp. Lesson
- 53. One Dimension Two Dimension Multi- Dimension Relationship Hierarchical Geo Maps Dimension: Time N Y Y N N N Dimension: Group Y Y Y Y Y Y Fact: Weight N Y Y Y Y Y Group and Weight N Y Y Y Y N Fact: Many values May be Y Y Y Y Y Multiple levels / Zoomable N N N Y Y Y Implications
- 54. List of Visual Encodings Source: http://complexdiagrams.com/properties
- 55. Case Study #1: Let’s apply what we learnt IPL Score Card
- 56. ESPNCricInfo Score Card 56
- 57. 57 Ball by ball! Commentary Per Batsman Statistics Per Bowler Statistics Fall of Wickets Partnerships Two innings Pre-match: Toss, Playing 11, Location, Time Post-match: Win, by how much, Man of the match Second Innings: Current Run Rate, Required Run Rate, Target score
- 58. Overs: Most important data-point 1. Overs = Time! 2. One over ! 1. has_many balls! 2. has_one bowler! 3. has_many batsmen! 3. Existence of batsmen across overs is partnerships! 4. Partnerships and Fall of wickets are the same different data set
- 59. Ball by ball Commentary
- 60. Partnerships
- 61. Combine the two Weighted two-dimensional chart Y-axis: Balls per over X-axis: Overs + Bowlers Gantt chart Y-axis: Batsmen X-axis: Overs + Bowlers All other “zoomable" information is shown via interactions
- 62. Putting it all together
- 63. Let’s see it live http://www.ﬁrstpost.com/cricket-live-score/IPL/1-jun-2014- kolkata-knight-riders-versus-kings-xi-punjab/2173/175977
- 64. Less reading. No scrolling. More awareness.
- 65. Case Study #2: Let’s apply what we learnt Election Counting Day
- 66. Election Counting Day Data Set:! • India has 50+ regional parties and two national parties.! • During Election Counting Day (live), seats are either “Leading” or “Won”! ! Data Properties / Relationships:! • Hierarchical Relation between Alliance and Party! • Won is conﬁrmed. Leading is transient.! ! What did readers want to know this Election:! • How badly would UPA lose! • How big would be the BJP victory! • How big would the impact of AAP would be! ! Real world facts to inspire design! • BJP is a right wing party! • AAP is left most followed by UPA! • The Sansad Hall is a semi-circle
- 67. Election Counting Day Data Set:! • India has 50+ regional parties and two national parties.! • During Election Counting Day (live), seats are either “Leading” or “Won”! ! Data Properties / Relationships:! • Hierarchical Relation between Alliance and Party! • Won is conﬁrmed. Leading is transient.! ! What did readers want to know this Election:! • How badly would UPA lose! • How big would be the BJP victory! • How big would the impact of AAP would be! ! Real world facts to inspire design:! • BJP is a right wing party! • AAP is left most followed by UPA! • The Sansad Hall is a semi-circle —> Group —> Tree —> Weight —> Limitation } Hence, all other parties! can be clubbed into ! other —> Shape —> Placement}
- 68. Choosing the right Grouped Weighted Tree Chart Packed Circle Sunburst Tree Rectangle Tree Bar Grouped Weighted Tree 68Visual encoding: Position, Size, Colour
- 69. Election Counting Day … Sunburst
- 70. Sansad Chart
- 71. Sansad Chart Focus on what is most imp.! Alliance is more imp. than Party. We spent 200% more time reversing hierarchy
- 72. Let’s see it live http://ﬁrstpost.com/election-results
- 73. Summary 1. Study properties and relationships of your Data Set! 2. Use your visual encodings wisely
- 74. Challenges in Data Journalism
- 75. Data Collection What’s the story Visualize Story Journalist Developer Designer • Govt. data! • APIs! • Scrape! • Mine web! • PDFs • Clean the data! • Model the data! • Investigate • Design! • Build Write Technology is an integral part of data journalism. Steps in data journalism
- 76. Data Driven Stories Visualization App Day-to-day short stories derived from data Big apps. to educate large and important event e.g. budget, election, etc. Formats in data journalism
- 77. Format #1 - Data Driven Stories Source: http://factchecker.in/data-are-crimes-against-scheduled-castes-on-an-upswing-in-india/ Badaun Case —> Find legit Data —> Analyse —> Plot —> Story
- 78. Format #2 - Visualization Apps
- 79. Data Collection What’s the story Visualize Story • Govt. data! • APIs! • Scrape! • Mine web! • PDFs • Clean the data! • Model the data! • Investigate • Design! • Build Write Format: Visualization app Format: Data Driven Stories Journalist Developer Designer Journalist Implication
- 80. High Level ! 1. Quick access to appropriate data set 2. Quick analysis of this data 3. Consistently churn out neat charts, graphs and maps Challenges
- 81. High Level ! 1. Quick access to appropriate data set 2. Quick analysis of this data 3. Consistently churn out neat charts, graphs and maps ! Technical ! 1. Live Data Modelling 2. SEO 3. How to handle high traﬃc Challenges
- 82. High Level ! 1. Quick access to appropriate data set 2. Quick analysis of this data 3. Consistently churn out neat charts, graphs and maps ! Technical ! 1. Live Data Modelling 2. SEO 3. How to handle high traﬃc ! From pykih perspective ! 1. How do you consistently build beautiful, real-time Visualizations? Challenges
- 83. What we are doing about it In-house tool called "Backstage"
- 84. #1 - Instead of waiting for data to be standardised, we want to make large scale, high- velocity, multi-format, data extraction durable. ! #2 - Instead of expecting data-users / journalists to have analytical skills, we are: • simplifying exploration of large data sets • automating extraction of metadata from data sets • simplifying assisted data standardisation • building tools for assisted analysis ! #3 - Instead of expecting data-users / journalists to Visualize data correctly, we are attempting automate meta-data driven Visualization ! Other Experiments • A data-driven blogging software • Conﬁguration Editor Principles —> Demo the worker —> Demo the census dashboard —> ISO example —> Demo NLP based Date Standardiser —> Story is in the outliers Example: If data is ordinal then colour automatically leverages saturation and if data is ordinal then colour is distinct
- 85. Data Visualization company => Data and Visualization company ! ! Eﬀective Data Journalism leverages: You will end up NoSQL, Memory based databases, NLP, OLAP modelling, Free Text Search, Statistics, etc. Summary
- 86. We are at @pykih Fun fact: The word pykih came to us in a CAPTCHA. That’s the day we decided that till we do good work it does not matter what we are called.

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment