The objective of this course is to introduce the principles and techniques of data visualization. Students will learn the basic concepts of communicating information through graphics and apply these concepts in building a visualization of their own.
Lesson 0: What is Data Visualization?
6. Data Visualization Nikhil Srivastava, 2015
Learn and Apply
Principles and Techniques
Effective Data Visualization
the
of
7. Data Visualization Nikhil Srivastava, 2015
Nikhil Srivastava
nsrivast@gmail.com
0713 987 262
I build products & businesses in the fields of finance & technology.
I organize & visualize information for teaching & understanding.
nikhilsrivastava.com
8. Data Visualization Nikhil Srivastava, 2015
About You: Homework #1
Savio Abuga
Mosaab Baba
Victor Chweya
Ron Gichuhi
Kevin Kavai
Mutisya Maingi
Andrew Makachia
Andrew Molo
Leon Muchoki
Anthony Mwangi
Waiyaki Njomo
9. Data Visualization Nikhil Srivastava, 2015
Course Details
• Class format
• Course website [see email]
– Slides, demos, extra material
– Code samples and libraries
– Final projects
10. Data Visualization Nikhil Srivastava, 2015
Course Overview
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Highcharts & Javascript
• Class Project
• [Next Steps]
introduction
foundation & theory
building blocks
construction
11. Data Visualization Nikhil Srivastava, 2015
• What is Data Visualization?
• Thinking and Seeing
• From Data to Graphics
• Highcharts & Javascript
• Class Project
• [Next Steps]
introduction
foundation & theory
building blocks
construction
12. Data Visualization Nikhil Srivastava, 2015
Data Visualization (DV)
Information Visualization
Scientific Visualization
Infographics
Statistical Graphics
Informative Art
Art
Science
Statistics
JournalismDesign
Visual Analytics
14. Data Visualization Nikhil Srivastava, 2015
City/Town County Population
Ahero Kisumu 76,828
Athi River Machakos 139,380
Awasi Kisumu 93,369
Kangundo-Tala Machakos 218,557
Karuri Kiambu 129,934
Kiambu Kiambu 88,869
Kikuyu Kiambu 233,231
Kisumu Kisumu 409,928
Kitale Trans-Nzoia 106,187
Kitui Kitui 155,896
Limuru Kiambu 104,282
Machakos Machakos 150,041
Molo Nakuru 107,806
Mwingi Kitui 83,803
Naivasha Nakuru 181,966
Nakuru Nakuru 307,990
Nandi Hills Trans-Nzoia 73,626
Ruiru Kiambu 238,858
Thika Kiambu 139,853
• Which is the most populous
city in the list?
• Which county in the list has
the most cities?
• Which county in the list has
the largest average city?
16. Data Visualization Nikhil Srivastava, 2015
• Which is the most populous
city in the list?
• Which county in the list has
the most cities?
• Which county in the list has
the largest average city?
17. Data Visualization Nikhil Srivastava, 2015
• Which is the most populous
city in the list?
• Which county in the list has
the most cities?
• Which county in the list has
the largest average city?
• What is the population of
Limuru?
18. Data Visualization Nikhil Srivastava, 2015
Data Visualization is:
• Useful
– Answers user questions
– Reduces user workload
(by design, not by default.)
21. Data Visualization Nikhil Srivastava, 2015
Data Visualization is:
• Important
– Resolve ambiguity
– Locate outliers
– Understand structure and patterns
30. Data Visualization Nikhil Srivastava, 2015
Data Visualization is:
• Powerful
– Communicate, teach, inspire
31. Data Visualization Nikhil Srivastava, 2015
Why use DV?
• Explore, Analyze, Communicate
• Answer user questions, facilitate tasks
• Visual = efficient
32. Data Visualization Nikhil Srivastava, 2015
Definitions
• “the process that transforms (abstract) data into interactive
graphical representations for the purpose of exploration,
confirmation, or presentation” 1
• “finding the artificial memory that best supports our natural
means of perception” 2
• “the use of computer-generated, interactive, visual
representations of data to amplify cognition” 3
• “giving information a visual representation that is useful for
analysis and presentation” 4
33. Data Visualization Nikhil Srivastava, 2015
Planetary Movement Line Chart
Unknown, ~1000 AD
A (Brief) History
38. Data Visualization Nikhil Srivastava, 2015
Why is DV relevant?
“The ability to take data—to be able to
understand it, to process it, to extract value from
it, to visualize it, to communicate it— that’s going
to be a hugely important skill in the next
decades, ... because now we really do have
essentially free and ubiquitous data.”
- Hal Varian, 2009
39. Data Visualization Nikhil Srivastava, 2015
Why is DV relevant?
• In one second …
• Open data
• Open technologies
• Growing use in business, science, media,
advertising
40. Data Visualization Nikhil Srivastava, 2015
Focus Extra
purpose communicate explore, analyze
data numerical,
categorical
text, maps,
graphs, networks
feature representation animation,
Interactivity
Course Scope
Discussion questions:
What exactly are we looking at?
What are the different components of this image: lines, areas, colors?
What does this image tell you?
Why was it designed this way?
This class is intended to help you answer these questions. We’re going to learn the how and why of data visualization.
How are we going to do this? We’ll start by what DV is: where and why it is used.
We’ll learn about what it means for DV to be effective - to be purposefully designed, to use appropriate techniques, and to achieve a specific purpose.
We’ll learn the principles of effective DV: some obvious and some subtle, some universal and some contentious. These principles will be grounded in how humans perceive, process, and draw insight from visual information.
We’ll also learn the techniques for producing effective DV: how different types of charts and graphics work, how to make design choices, and guidelines for good design.
And once we’ve mastered these concepts, we’ll apply them in creating visualizations of our own.
It’s a lot to cover! But before we get started, let’s get to know each other and go through some course details.
Who I am, where I’m from, what I do, what I’m about.
Let’s introduce ourselves. Share with us your answer to the first homework question: a piece of numerical data that describes something about your background, your life, or your interests.
Lastly, before we get started – a few notes on the course.
We’re going to use a variety of formats: lectures, discussion, demos, homework, coding, presentations. (The first class will be more lecture than usual.) Also, please jump in with questions and discussion anytime.
Please continue to monitor the course website. It will have links to everything we cover in class as well as extra material. It will also have links to code samples and libraries, and we’ll put all the projects up at the end of the class. The website will be available after the class ends for your reference.
So the way this class is organized is as follows.
The first half will be information-based.
We’ll start by learning what DV is, what it is used for, how and when it is commonly used (properly and improperly), and why it is relevant. Then we’ll learn how humans process visual information, and why that is relevant to visualization design. Next, we’ll study the building blocks of visualizations – the points, lines, shapes, and colors that represent data – and we’ll organize in different ways to produce different types of charts. We’ll learn about a range of commonly-used charts for visualization, their use cases, advantages, and disadvantages. After learning what makes a chart, we will consider what makes a *good* chart. We’ll learn principles and guidelines for good visualization design and be able to critique existing visualizations.
The second half will be project-based.
We will do an introduction to Javascript and the charting library Highcharts, and we’ll explore and tinker with examples of basic charts. Then we’ll get to work on our project of building our own data visualizations – working individually or in pairs, we’ll select our data, determine the visualization objectives, make design choices, sketch, prototype, and finally implement the visualizations. We’ll have presentations with class discussion and feedback.
If there’s time, we’ll have a discussion on advanced topics in visualization.
Alright, let’s get started – what is data visualization?
It’s difficult to define precisely: as a field, DV has many related and overlapping goals and descriptions. It is often used interchangeably with different terms, and it falls under many different disciplines.
Better than a definition is an example. Let’s take a look at this table of Kenyan cities showing city name, county name, and city population. Take a moment to understand the structure of this data, because I’m about to ask you a few questions on it.
Better than a definition is an example. Let’s take a look at this table of Kenyan cities showing city name, county name, and city population. Take a moment to understand the structure of this data, because I’m about to ask you a few questions on it.
Now let’s look at the data in a different way. We’re looking at a visualization of the same data known as a bar chart. Each city is represented by a bar whose length is proportional to the city’s population. Cities within the same county are colored the same and grouped together. Within each county, cities are ordered in decreasing population.
Now, let’s answer the same questions by using the visualization.
What are the cognitive steps required?
How easy or difficult is the process?
Now let’s ask an additional question we didn’t ask before.
We’ve learned that data visualization can be useful in telling us things about a set of data, making it easier to find information and answer questions. We’ve also learned that this usefulness depends both on the design of the visualization and the specific information we are looking for.
Let’s take a look at another example. This is a data set called Anscombe’s Quartet, named after the statistician who devised it. It consists of four separate sets of data, each of which is a list of ten pairs of numbers. So there are ten different X and Y values that are paired. To make this a bit more concrete, you can imagine that each data set describes ten people, X represents their height and Y represents their weight.
The interesting thing is that all four of these data sets have exactly the same relationship between the X and Y numbers. All X values have the same average and standard deviation, and so do all Y values. Furthermore, the correlation between X and Y is the same for all sets.
And except for the last one (which has a bunch of 8s), there’s not much we can do to distinguish them or describe them meaningfully by just looking at the numbers in the table. Now let’s see what happens when we plot them.
Here we’ve visualized the data in what’s known as a scatter plot. Each dot represents one of the ten pairs, located on the horizontal axis by X value and on the vertical axis by Y value.
By visualizing the data, we see patterns, outliers, and relationships that were impossible to detect in the chart.
So we’ve learned that DV is important. It can help us resolve ambiguous data, locate outliers, and generally understand the structure and pattern of a data set.
Another example. How much bigger do you think line B is compared with line A? 1 times, 2 times, 5 times?
Now, what about these two squares?
And these two cubes?
This exercise shows us that our perception of the relative sizes of objects can be skewed in different dimensions. This is important if we are trying to communicate the relationship between two numbers, because our design choices will effect the interpretation.
Later in the course, we’ll learn about ways of avoiding confusion in these situations.
Locations of geocoded tweets in Nairobi before the 2013 presidential elections, a collaborative between Ushahidi and Hivos.
Infographic of twitter activity in Africa in late 2013 produced by Portland Communications.
Interactive tool from the Gapminder Foundation animating the health and wealth of world countries over time. This screenshot shows the historical path of Kenya from 1800 to 2013.
Note the number of data types (life expectancy, GDP, population per country and year) and variety of visual encodings (x- and y- position, size, color, time).
Locate data (What is current temp?)
Analyze trend (Is X improving?)
Understand relationship (between X and Y, as in Anscombe quartet example)
Identify points/areas of interest
Spot outliers
[1] Alexander Lex, Harvard CS171 (2015)
[2] Bertin (1967)
[3] Sneiderman et al. (1999)
[4] John Stasko, CS7450 (2013)
Monastery text apparently plotting the latitude of celestial bodies over time. According to Tufte “a mysterious and isolated wonder in the history of data graphics”, predates common line charts by 800 years.
Source: http://commons.wikimedia.org/wiki/File:Planetary_Movements.gif
Joseph Priestley produced this timeline of historical empires and rulers to map out the world’s history. Regions are organized vertically, and time proceeds horizontally. Areas represent different rules or regimes, and the largest areas (empires) are color-colored.
Priestley’s instructions for viewing: “If a person carry his eye horizontally, he sees, in a very short time, all the revolutions that have taken place in any particular country […] and this is done with more exactness, and in much less time, than it could have been done by reading […] If the reader carries his eye vertically, he will see the contemporary state of all the empires subsisting in the world, at any particular time.”
Priestley intended this to be read alongside “A Chart of Biography” that he built four years earlier.
Source: https://en.wikipedia.org/wiki/A_New_Chart_of_History
Source: A Description of a New Chart of History: Containing a View of the Principal Revolutions of Empire that Have Taken Place in the World, Joseph Priestley, 1786
Colored line chart showing various economic metrics: price of bread, exports, national debt. Notable for use of common y-scale (different units), color, linear time axis.
Source: http://libweb5.princeton.edu/visual_materials/maps/websites/thematic-maps/quantitative/sociology-economics/playfair-chronology-1824.jpg
After treating British soldiers wounded in the Crimean War and taking detailed notes on causes of death, Florence Nightingale produced this graphic in a report to Parliament to promote sanitary improvements in hospitals and barracks. The visualization is a variant of a pie chart known as a polar-area diagram, rose diagram, or coxcomb - wedges have equal angle but varying radius (instead of varying angle and equal radius in a pie chart). It breaks down the total number of war deaths in the Crimean war by cause: blue for preventable disease, red for battle injuries, and black for other causes.
Source: https://en.wikipedia.org/wiki/Pie_chart#/media/File:Nightingale-mortality.jpg
Source: http://itelligencegroup.com/uk/localinsights/reworking-florence-nightingales-diagram-of-the-causes-of-mortality-in-the-army-in-the-east-with-sap-lumira/?isnews=
Doug Engelbart gave a presentation of his research team’s work at the Augmentation Research Center to expand the interface for computers for information management and retrieval. The 1950s and 60s saw rapid advancements in modern computing and computer graphics that led to the advancement of scientific and information visualizations.
Source: http://www.wired.com/2013/12/tech-time-warp-engelbart/
Full video: https://www.youtube.com/watch?v=yJDv-zdhzMY