This document summarizes the benefits of data visualization and provides examples of bad visualization practices to avoid. It discusses how visualization can help summarize data, detect patterns and trends, explore data interactively, and tell stories. However, it also shows examples of misleading visualizations that misuse axes, scales, colors, maps and imply false correlations. The document encourages developing a critical eye to evaluate visualizations based on usability, first impressions, return on effort, self-sufficiency, and removing unnecessary elements.
Principles and Practices of Data VisualizationKianJazayeri1
"Principles of Data Visualization" by Asst. Prof. Dr. Kian Jazayeri offers a deep dive into effective data representation techniques. The presentation begins by underlining the importance of data visualization in revealing true data insights, avoiding errors, and facilitating knowledge sharing. It challenges the viewer to think beyond basic charts, highlighting that effective visualization requires sophisticated skills to accurately convey complex information.
The deck uses Anscombe's Quartet to illustrate the misleading nature of statistics without proper visual representation, showcasing how different data distributions can look when graphed, despite having identical statistical summaries. This example sets the stage for discussing the necessity of visual analysis to uncover the real story behind the data.
Art appreciation parallels are drawn to emphasize the importance of visual aesthetics in data visualization. By comparing renowned artworks, the slides suggest that, like art, data visualization requires a developed sense of design and aesthetics to communicate effectively and make an impact.
Edward Tufte's visualization principles are explored in depth, advocating for a high data-ink ratio, and warning against the lie factor—where the representation of data misleads more than it informs. The presentation also addresses chartjunk, encouraging the removal of unnecessary visual elements that do not add value to the data's understanding.
Dr. Jazayeri emphasizes graphical integrity, advising against scale distortion and advocating for accurate, clear labeling to maintain the data's true proportion and context. The concept of aspect ratios is discussed, advising a balance to avoid visual misrepresentation of trends.
Interactive elements within the slides engage viewers, prompting them to analyze different visualizations and understand how quickly and accurately data can be interpreted. This engagement highlights the "10-Second Rule," the idea that effective visualizations should allow quick and unambiguous data interpretation.
Color usage in data visualization is another focal point, with explanations on how different colors and their intensities can significantly affect data interpretation. Special attention is given to designing for color blindness, ensuring inclusivity in data communication.
Advanced topics include data maps, cartograms, scatter plots, and heatmaps, each discussed with their specific applications and potential for overplotting or misinterpretation. The presentation also critiques tabular data, suggesting improvements for clarity, comparison, and highlighting critical information.
Renowned works, like Minard's depiction of Napoleon's Russian campaign and Marey’s train schedule, are dissected to demonstrate how effective visual storytelling can enhance the comprehension of complex data narratives.
This slide deck is from a workshop that took place at the UNC Chapel Hill Davis Library Research Hub.
Collecting data is now easier than it has ever been. But, as data becomes more prolific, datasets become larger and more complex. How do we find meaningful patterns in our data? How can we communicate those patterns to others? Data visualization allows us to make sense of today’s ever evolving information landscape.
This workshop will introduce the history and basic principles of data visualization. Learn about best practices and resources for making an impact with your data through compelling charts, graphs and maps.
Kim Steenstrup Pedersen, lektor, Image Section, Department of Computer Science, København Universitet
Overblik over kunstig intelligens og digital billedanalyse. For øjeblikket sker der en rivende udvikling indenfor kunstig intelligens og især inden for analyse af digitale billeder og film. Vi ser jævnlige historier i pressen om nye fantastiske gennembrud indenfor kunstig intelligens (en del af disse historier udspringer fra store virksomheder som Google, Facebook og Amazon). Det er nærliggende at spørge – kan jeg anvende kunstig intelligens på min billedsamling? I dette foredrag vil jeg give et overblik over hvad kunstig intelligens og digital billedanalyse er og hvad det kan anvendes til. Jeg vil også give et indblik i styrker og svagheder ved eksisterende metoder og specielt hvad man skal være opmærksom på hvis man ønsker at anvende kunstig intelligens på sine billedsamlinger.
Principles and Practices of Data VisualizationKianJazayeri1
"Principles of Data Visualization" by Asst. Prof. Dr. Kian Jazayeri offers a deep dive into effective data representation techniques. The presentation begins by underlining the importance of data visualization in revealing true data insights, avoiding errors, and facilitating knowledge sharing. It challenges the viewer to think beyond basic charts, highlighting that effective visualization requires sophisticated skills to accurately convey complex information.
The deck uses Anscombe's Quartet to illustrate the misleading nature of statistics without proper visual representation, showcasing how different data distributions can look when graphed, despite having identical statistical summaries. This example sets the stage for discussing the necessity of visual analysis to uncover the real story behind the data.
Art appreciation parallels are drawn to emphasize the importance of visual aesthetics in data visualization. By comparing renowned artworks, the slides suggest that, like art, data visualization requires a developed sense of design and aesthetics to communicate effectively and make an impact.
Edward Tufte's visualization principles are explored in depth, advocating for a high data-ink ratio, and warning against the lie factor—where the representation of data misleads more than it informs. The presentation also addresses chartjunk, encouraging the removal of unnecessary visual elements that do not add value to the data's understanding.
Dr. Jazayeri emphasizes graphical integrity, advising against scale distortion and advocating for accurate, clear labeling to maintain the data's true proportion and context. The concept of aspect ratios is discussed, advising a balance to avoid visual misrepresentation of trends.
Interactive elements within the slides engage viewers, prompting them to analyze different visualizations and understand how quickly and accurately data can be interpreted. This engagement highlights the "10-Second Rule," the idea that effective visualizations should allow quick and unambiguous data interpretation.
Color usage in data visualization is another focal point, with explanations on how different colors and their intensities can significantly affect data interpretation. Special attention is given to designing for color blindness, ensuring inclusivity in data communication.
Advanced topics include data maps, cartograms, scatter plots, and heatmaps, each discussed with their specific applications and potential for overplotting or misinterpretation. The presentation also critiques tabular data, suggesting improvements for clarity, comparison, and highlighting critical information.
Renowned works, like Minard's depiction of Napoleon's Russian campaign and Marey’s train schedule, are dissected to demonstrate how effective visual storytelling can enhance the comprehension of complex data narratives.
This slide deck is from a workshop that took place at the UNC Chapel Hill Davis Library Research Hub.
Collecting data is now easier than it has ever been. But, as data becomes more prolific, datasets become larger and more complex. How do we find meaningful patterns in our data? How can we communicate those patterns to others? Data visualization allows us to make sense of today’s ever evolving information landscape.
This workshop will introduce the history and basic principles of data visualization. Learn about best practices and resources for making an impact with your data through compelling charts, graphs and maps.
Kim Steenstrup Pedersen, lektor, Image Section, Department of Computer Science, København Universitet
Overblik over kunstig intelligens og digital billedanalyse. For øjeblikket sker der en rivende udvikling indenfor kunstig intelligens og især inden for analyse af digitale billeder og film. Vi ser jævnlige historier i pressen om nye fantastiske gennembrud indenfor kunstig intelligens (en del af disse historier udspringer fra store virksomheder som Google, Facebook og Amazon). Det er nærliggende at spørge – kan jeg anvende kunstig intelligens på min billedsamling? I dette foredrag vil jeg give et overblik over hvad kunstig intelligens og digital billedanalyse er og hvad det kan anvendes til. Jeg vil også give et indblik i styrker og svagheder ved eksisterende metoder og specielt hvad man skal være opmærksom på hvis man ønsker at anvende kunstig intelligens på sine billedsamlinger.
A practical Introduction to Machine(s) LearningBruno Gonçalves
The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. On the other hand, the complexity of the analyses required to extract useful information from these piles of data is also rapidly increasing rendering more traditional and simpler approaches simply unfeasible or unable to provide new insights.
In this tutorial we provide a practical introduction to some of the most important algorithms of machine learning that are relevant to the field of Complex Networks in general, with a particular emphasis on the analysis and modeling of empirical data. The goal is to provide the fundamental concepts necessary to make sense of the more sophisticated data analysis approaches that are currently appearing in the literature and to provide a field guide to the advantages an disadvantages of each algorithm.
In particular, we will cover unsupervised learning algorithms such as K-means, Expectation-Maximization, and supervised ones like Support Vector Machines, Neural Networks and Deep Learning. Participants are expected to have a basic understanding of calculus and linear algebra as well as working proficiency with the Python programming language.
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...Chris Rackauckas
How does automatic differentiation work, what happens when you apply it to equation solvers, and how can it go wrong? This talk is all about the details of how scientific machine learning (SciML) works. It goes into detail as to how neural networks are trained in the context of equation solvers, along with the numerical issues that can arise in the differentiation processes.
https://sciml.ai/
Applying your Convolutional Neural NetworksDatabricks
Part 3 of the Deep Learning Fundamentals Series, this session starts with a quick primer on activation functions, learning rates, optimizers, and backpropagation. Then it dives deeper into convolutional neural networks discussing convolutions (including kernels, local connectivity, strides, padding, and activation functions), pooling (or subsampling to reduce the image size), and fully connected layer. The session also provides a high-level overview of some CNN architectures. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Computer Vision Course includes deep learninggigap29589
SlideShare is great for presentations and last minute assignments but unfortunately can't download slides? Don't worry, here is the tool that will help you download slides in no time. Just a bit of knowledge of python web scraping and selenium is what I used to build this tool.
Abstract: Generative models, and in particular adversarial ones, are becoming prevalent in computer vision as they enable enhancing artistic creation, inspire designers, prove usefulness in semi-supervised learning or robotics applications.
We will see how to develop the abilities of Generative Adversarial Networks (GANs) to
deviate from training examples to generate more original images of fashion designs. As a limitation of GANs is the production of raw images of low resolution, we also present solutions to produce vectorized results, and show how the developed method may be useful for image editing.
This slide is my presentation for a reading circle "Machine Learning Professional Series".
Japanese version is here.
http://www.slideshare.net/matsukenbook/ss-50545587
The field of Artificial Intelligence (AI) has been revitalized in this decade, primarily due to the large-scale application of Deep Learning (DL) and other Machine Learning (ML) algorithms. This has been most evident in applications like computer vision, natural language processing, and game bots. However, extraordinary successes within a short period of time have also had the unintended consequence of causing a sharp difference of opinion in research and industrial communities regarding the capabilities and limitations of deep learning. A few questions you might have heard being asked (or asked yourself) include:
a. We don’t know how Deep Neural Networks make decisions, so can we trust them?
b. Can Deep Learning deal with highly non-linear continuous systems with millions of variables?
c. Can Deep Learning solve the Artificial General Intelligence problem?
The goal of this seminar is to provide a 1000-feet view of Deep Learning and hopefully answer the questions above. The seminar will touch upon the evolution, current state of the art, and peculiarities of Deep Learning, and share thoughts on using Deep Learning as a tool for developing power system solutions.
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...Jörg Bächtiger
The analysis of competing hypothesis (ACH) is an investigation technique used to handle evidence in the domain of national security to fight bias and preconception as well as to ask the right questions. This technique can be applied to software development!
Information Visualization for Medical Informatics
Lifelines, Lifelines2, LifeFlow, treemaps, networks
(slide file: Shneiderman info vismedical-georgetown-v1 )
A practical Introduction to Machine(s) LearningBruno Gonçalves
The data deluge we currently witnessing presents both opportunities and challenges. Never before have so many aspects of our world been so thoroughly quantified as now and never before has data been so plentiful. On the other hand, the complexity of the analyses required to extract useful information from these piles of data is also rapidly increasing rendering more traditional and simpler approaches simply unfeasible or unable to provide new insights.
In this tutorial we provide a practical introduction to some of the most important algorithms of machine learning that are relevant to the field of Complex Networks in general, with a particular emphasis on the analysis and modeling of empirical data. The goal is to provide the fundamental concepts necessary to make sense of the more sophisticated data analysis approaches that are currently appearing in the literature and to provide a field guide to the advantages an disadvantages of each algorithm.
In particular, we will cover unsupervised learning algorithms such as K-means, Expectation-Maximization, and supervised ones like Support Vector Machines, Neural Networks and Deep Learning. Participants are expected to have a basic understanding of calculus and linear algebra as well as working proficiency with the Python programming language.
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...Chris Rackauckas
How does automatic differentiation work, what happens when you apply it to equation solvers, and how can it go wrong? This talk is all about the details of how scientific machine learning (SciML) works. It goes into detail as to how neural networks are trained in the context of equation solvers, along with the numerical issues that can arise in the differentiation processes.
https://sciml.ai/
Applying your Convolutional Neural NetworksDatabricks
Part 3 of the Deep Learning Fundamentals Series, this session starts with a quick primer on activation functions, learning rates, optimizers, and backpropagation. Then it dives deeper into convolutional neural networks discussing convolutions (including kernels, local connectivity, strides, padding, and activation functions), pooling (or subsampling to reduce the image size), and fully connected layer. The session also provides a high-level overview of some CNN architectures. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Computer Vision Course includes deep learninggigap29589
SlideShare is great for presentations and last minute assignments but unfortunately can't download slides? Don't worry, here is the tool that will help you download slides in no time. Just a bit of knowledge of python web scraping and selenium is what I used to build this tool.
Abstract: Generative models, and in particular adversarial ones, are becoming prevalent in computer vision as they enable enhancing artistic creation, inspire designers, prove usefulness in semi-supervised learning or robotics applications.
We will see how to develop the abilities of Generative Adversarial Networks (GANs) to
deviate from training examples to generate more original images of fashion designs. As a limitation of GANs is the production of raw images of low resolution, we also present solutions to produce vectorized results, and show how the developed method may be useful for image editing.
This slide is my presentation for a reading circle "Machine Learning Professional Series".
Japanese version is here.
http://www.slideshare.net/matsukenbook/ss-50545587
The field of Artificial Intelligence (AI) has been revitalized in this decade, primarily due to the large-scale application of Deep Learning (DL) and other Machine Learning (ML) algorithms. This has been most evident in applications like computer vision, natural language processing, and game bots. However, extraordinary successes within a short period of time have also had the unintended consequence of causing a sharp difference of opinion in research and industrial communities regarding the capabilities and limitations of deep learning. A few questions you might have heard being asked (or asked yourself) include:
a. We don’t know how Deep Neural Networks make decisions, so can we trust them?
b. Can Deep Learning deal with highly non-linear continuous systems with millions of variables?
c. Can Deep Learning solve the Artificial General Intelligence problem?
The goal of this seminar is to provide a 1000-feet view of Deep Learning and hopefully answer the questions above. The seminar will touch upon the evolution, current state of the art, and peculiarities of Deep Learning, and share thoughts on using Deep Learning as a tool for developing power system solutions.
The Secret Service Methods for Finding and Fixing Unexplainable Distributed S...Jörg Bächtiger
The analysis of competing hypothesis (ACH) is an investigation technique used to handle evidence in the domain of national security to fight bias and preconception as well as to ask the right questions. This technique can be applied to software development!
Information Visualization for Medical Informatics
Lifelines, Lifelines2, LifeFlow, treemaps, networks
(slide file: Shneiderman info vismedical-georgetown-v1 )
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
2. Lies, Damned Lies & Dataviz
Bad visualization, and how to avoid it
Dr. Andrew Clegg
Director, Learner Analytics & Data Science
Pearson
@andrew_clegg
3. Part I — Why Visualize?
What are the benefits — when it’s done right?
Part II — Bad Dataviz
How to spot the failures — and how to avoid them yourself
Warning: Contains Opinion!
Introduction
5. ● Summarizing and communicating numbers
● Drawing attention to trends and patterns
● Exploring data interactively
● Capturing attention
● Telling stories
What is the goal?
6. Playing to your neural hardware’s strengths
Your visual system excels at pattern detection & parallel processing.
Representing data graphically means you can leverage this “for free”.
How does visualization help?
7. Challenge: estimate x when y = 0
x y x y x y
27.38 24.05 32.31 31.61 75.67 14.83
62.64 7.31 51.84 28.61 34.23 31.65
50.76 16.30 59.04 18.29 51.21 7.69
42.94 26.78 74.63 1.15 47.26 22.90
8.72 42.35 56.15 11.37 66.60 3.21
30.62 30.87 47.23 19.49 17.46 40.31
62.63 9.14 59.36 8.82 65.70 12.79
63.21 18.66 44.58 19.12 52.24 12.92
40.49 23.29 47.85 20.55 62.56 14.17
22.07 41.46 68.21 11.99 40.43 19.77
14. Avoiding limitations of statistics
Showing patterns in large data sets with minimal information loss.
Revealing structure of “tricky” data sets where typical summary
statistics do a poor job.
How does visualization help?
15. Showing patterns in large data sets
https://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919
16. Describing statistically tricky data
http://www.stanford.edu/~mwaskom/software/seaborn/examples/anscombes_quartet.html
All four have the
same:
mean(x)
variance(x)
mean(y)
variance(y)
correlation coefficient
regression coefficients
Anscombe’s Quartet
(Francis Anscombe, 1973)
17. Describing statistically tricky data
Much web data,
especially involving
human preferences or
choices, looks like this.
There is no “central
tendency” so typical
descriptive statistics are
useless.
Zipfian distribution,
an example of a
power law.
18. How does visualization help?
Illustrating a story
Visualizations are often used simply to clarify or reinforce the main
points of a story, narrative or message.
This process fails when the conclusions suggested by the graphic are
irrelevant to the narrative, or even contradict it.
It can also fail when the graphic has no clear message or multiple
conflicting interpretations, or is largely incomprehensible.
Many of the following examples illustrate these mistakes.
24. Example from Stephen Few (PDF)
Dual axes: caution
Natural interpretation:
Units sold “dipped below”
revenue (A) and is now
“catching up” (B).
But these impressions are
meaningless.
They are just artefacts of the
chosen axis scales.
A
B
25. Proportionality errors
From an Australian document found at The Guardian
1 row of people = roughly 43,000 nurses.
10 rows = roughly 48,000 nurses.
?!?
28. Axis inversion: when “down” means “up”?!?
From Thomson Reuters via Business Insider
Version published by Reuters Version “fixed” by @PFedewa
29. Bad dataviz
2. Distance vs. area vs. volume
http://muhammadfamizwanabdullah.blogspot.co.uk/2010/11/10-introduction-of-teaching-volume-of.html
30. Pie charts: avoid
Bad
Colours used for separating slices, so can’t
easily be put to another use.
No way to show time dimension statically.
Comparing relative sizes of slices is hard.
Doing it in 3D is harder. Perspective inflates
nearer slices, and the similar volume of the
objects is a red herring.
Doing it with deep, discontinuous 3D objects
is even harder.
Worse
Worst
31. Perhaps justifiable (in 2D) if numbers are sufficiently different.
Otherwise, use a much simpler design and avoid all those problems.
Pie charts: avoid
33. Pie chart horrors
From a World Bank report (PDF) found at The Guardian
These ones show 96%
and 40% as full circles.
This one is falling apart.
This one thinks 76% is
less than three quarters.
34. Even worse uses of 3D
https://www.tableausoftware.com/public/blog/2011/01/viz-wiz-1-11
and http://www.simplexnumerica.com/Gallery/gallery_pyramid.html
Cones, pyramids, spheres etc…
Are we comparing width, height,
area or volume? Nobody knows!
26.76% = tiny peak
23.32% = massive slab
?!?
35. Stacked charts: caution
Stacked charts show how
a data series breaks
down by another
attribute of the data.
But people often misread
these as two distinct data
series, reading off a
separate y-axis value for
each one.
39. Non-normalized quantities are useless
http://personal.frostburg.edu/jibandy0/starbucks%20map.jpg
Don’t use absolute
values without a very
good reason.
Normalize appropriately:
per capita, per adult, per
student, per household,
per square km, per
journey, per voter …
40. Remember: geopolitical boundaries are artificial
This map shows all the
countries I’ve visited.
The relative size of USA
makes me seem much
more widely travelled
than I really am.
Is “country” the right
level of aggregation?
44. Drawbacks of maps
● Can’t easily show time dimension, without animation
● Hard to show multiple attributes of data at once
● Physical proximity can obscure demographic/cultural differences,
and vice versa
Just because you can map the data, doesn’t mean you should.
Save maps for when geographical trends are the key focus.
47. Diverging data
http://www-03.ibm.com/press/us/en/pressrelease/35359.wss
Here the yellow section indicates the median.
Red/green = above/below median.
However, the red and green ranges are not scaled
well. 75 (close to median) is almost the same
colour as 108 (max).
Sequential data, but with a
well-defined midpoint.
Two directions from this
midpoint -- two poles:
above/below average,
positive/negative, female/male,
Democrat/Republican etc.
48. Categorical data
Also known as nominal or qualitative.
Colours should not form a pattern, as this
can imply a false relationship.
The ethnicity colours here are reasonable,
although quite close in colour space.
The location colours are badly chosen.
They suggest a linear progression, which
is meaningless.
http://www.visualizing.org/full-screen/10886
50. Other considerations
● Colour blindness -- nearly 10% of men -- rare in women
● Print and photocopy friendliness
● Characteristics of different screens, esp. projectors
ColorBrewer is a great help:
See also…
● brewer2mpl (Python)
● RColorBrewer (R)
● ColorBrewer (Matlab)
http://colorbrewer2.org/
52. Beware of bogus correlations
http://gizmodo.com/5977989/internet-explorer-vs-murder-rate-will-be-your-favorite-chart-today/
and http://pubs.acs.org/doi/abs/10.1021/ci700332k
Correlation does not prove causation, even with a good R2
score.
53. Beware of bogus correlations
Even respectable journals
sometimes get carried away.
Ask yourself:
Are these both effects of a
common cause?
Or just sheer chance?
(Multiple comparisons)
http://www.nejm.org/doi/full/10.1056/NEJMon1211064
54. Bad dataviz
6. Trying to say too much
Each visualization needs a clear purpose. But some designers and
analysts try to include every possible piece of information.
This is not a good idea.
Unnecessary detail and ostentatiously “clever” presentation can
obscure the real message.
56. 7. Tips for developing a critical eye
Here are some techniques you can use for critical analysis.
They are often subjective, debatable, context-dependent and partly
based on aesthetics… So don’t expect absolute rules.
Bad dataviz
57. Usability
Does the chart need detailed instructions in order for it to be
comprehensible and usable?
● Acceptable if this is a standard visualization method used in a
particular domain
● Less acceptable if this is a one-off for general consumption
58. First impressions test
What is the first thing you infer from looking at the visualization?
(Don’t stop to read every detail -- see what you get from a glance.)
Does this impression prove to be accurate,
on closer inspection?
If not, then there may be a problem.
Many people will only glance and never
perform the close inspection.
60. Self-sufficiency test (Kaiser Fung)
Would the chart make sense without the numbers printed on each
data point?
If not, the chart has failed
the self-sufficiency test.
http://junkcharts.typepad.com/junk_charts/2013/03/blowing-the-whistle-at-bubble-charts.html
61. Trifecta checkup (Kaiser Fung)
Ask the following:
● What practical question does the graphic
attempt to address?
● What answer does the data imply?
● What answer does the graphic imply?
Can you answer these clearly?
Do the three answers align?
If not, there is something wrong.
http://junkcharts.typepad.com/junk_charts/2014/02/pets-may-need-shelter-from-this-terrible-chart.html
62. Data-ink score (Edward Tufte)
Main principle: Remove redundant or uninformative elements from
the design, to reduce distraction. High data-ink ratio = clarity.
http://www.infovis-wiki.net/index.php/Data-Ink_Ratio
63. And finally…
Ask yourself how much you trust the data.
Professional presentation does not imply reliable numbers.
Is there enough data to be sure of statistical significance?
What are the margins of error?
Is there a plausible mechanism of action?
What about sources of bias (accidental or intentional), confounding
factors, missing data, or measurement error (noise)?