This slide deck is from a workshop that took place at the UNC Chapel Hill Davis Library Research Hub.
Collecting data is now easier than it has ever been. But, as data becomes more prolific, datasets become larger and more complex. How do we find meaningful patterns in our data? How can we communicate those patterns to others? Data visualization allows us to make sense of today’s ever evolving information landscape.
This workshop will introduce the history and basic principles of data visualization. Learn about best practices and resources for making an impact with your data through compelling charts, graphs and maps.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/01/practical-dnn-quantization-techniques-and-tools-a-presentation-from-facebook/
Raghuraman Krishnamoorthi, Software Engineer at Facebook, presents the “Practical DNN Quantization Techniques and Tools” tutorial at the September 2020 Embedded Vision Summit.
Quantization is a key technique to enable the efficient deployment of deep neural networks. In this talk, Krishnamoorthi presents an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations.
Krishnamoorthi explores simple and advanced quantization approaches and examine their effects on latency and accuracy on various target processors. He also presents best practices for quantization-aware training to obtain high accuracy with quantized weights and activations.
Apache Hadoop started as batch: simple, powerful, efficient, scalable, and a shared platform. However, Hadoop is more than that. It's true strengths are:
Scalability – it's affordable due to it being open-source and its use of commodity hardware for reliable distribution.
Schema on read – you can afford to save everything in raw form.
Data is better than algorithms – More data and a simple algorithm can be much more meaningful than less data and a complex algorithm.
Decentralized identity uses standards to create an interoperable language for new identity products and services to be build. Using Verifiable Credentials and Decentralized Identifiers.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/01/practical-dnn-quantization-techniques-and-tools-a-presentation-from-facebook/
Raghuraman Krishnamoorthi, Software Engineer at Facebook, presents the “Practical DNN Quantization Techniques and Tools” tutorial at the September 2020 Embedded Vision Summit.
Quantization is a key technique to enable the efficient deployment of deep neural networks. In this talk, Krishnamoorthi presents an overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations.
Krishnamoorthi explores simple and advanced quantization approaches and examine their effects on latency and accuracy on various target processors. He also presents best practices for quantization-aware training to obtain high accuracy with quantized weights and activations.
Apache Hadoop started as batch: simple, powerful, efficient, scalable, and a shared platform. However, Hadoop is more than that. It's true strengths are:
Scalability – it's affordable due to it being open-source and its use of commodity hardware for reliable distribution.
Schema on read – you can afford to save everything in raw form.
Data is better than algorithms – More data and a simple algorithm can be much more meaningful than less data and a complex algorithm.
Decentralized identity uses standards to create an interoperable language for new identity products and services to be build. Using Verifiable Credentials and Decentralized Identifiers.
What is Data Warehouse?OLTP vs. OLAP, Conceptual Modeling of Data Warehouses,Data Warehousing Components, Data Warehousing Components, Building a Data Warehouse, Mapping the Data Warehouse to a Multiprocessor Architecture, Database Architectures for Parallel Processing
Just finished a basic course on data science (highly recommend it if you wish to explore what data science is all about). Here are my takeaways from the course.
This document is the first deliverable of the Lean Big Data work package 7 (WP7). The main goal of the package 7 is to provide the use cases applications that will be used to validate the Lean Big Data platform. To this end, an analysis of requirement of each use case will be provided in the scope.This analysis will be used as basis for the description of the evaluation, benchmarking and validation of the Lean Big Data platform.
This deliverable comprises the analysis of requirements for the following case of study provided in the context of Lean Big Data: Data Centre monitoring Case Study, Electronic Alignment of Direct Debit transactions Case Study, Social Network-based Area surveillance Case Study and Targeted Advertisement Case Study.
La blockchain est la technologie buzzword du moment pour assurer une traçabilité des transactions. Beaucoup de personnes en parlent, mais très peu savent comment cela fonctionne réellement. Pire, de plus d'outils d'implémentations font leur apparition, et il est difficile de s'y retrouver.
The Linux Foundation a créé Hyperledger Fabric, un framework permettant la réalisation d'un réseau de blockchain from scratch très prometteur, que nous allons décortiquer ensemble. Nous allons donc réaliser un tour d'horizon de ce qu'il permet et explorer les tenants et aboutissants de ce qu'est concrètement la réalisation d'un tel réseau.
Cette présentation a pour vocation de vous permettre de comprendre en détail le fonctionnement de cette technologie, et surtout d'être capable de savoir si elle peut répondre à vos besoins.
Par Vincent Segouin, Développeur full-stack JS Java chez Xebia
Toutes les informations sur xebicon.fr
An Intro to NoSQL Databases -- NoSQL databases will not become the new dominators. Relational will still be popular, and used in the majority of situations. They, however, will no longer be the automatic choice. (source : http://martinfowler.com/)
Forecasted stock prices of Google using historical stock price data and sentiment scores using Sentiment Analyzer in Python from New York Times headlines, implemented different Time Series Models – ARIMA, Exponential Smoothing, Holtwinters, also used Sentiment Score regression models, Fb Prophet, also implemented Deep Learning Models
La visualisation est un élément important de la compréhension et de la (re)présentation des données dans les (data) sciences. Elle repose sur des principes et des outils que Christophe Bontemps (Toulouse School of Economics) décryptera à la lumière de son expérience et de ses lectures.
HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks.
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
This Edureka Pig Tutorial ( Pig Tutorial Blog Series: https://goo.gl/KPE94k ) will help you understand the concepts of Apache Pig in depth.
Check our complete Hadoop playlist here: https://goo.gl/ExJdZs
Below are the topics covered in this Pig Tutorial:
1) Entry of Apache Pig
2) Pig vs MapReduce
3) Twitter Case Study on Apache Pig
4) Apache Pig Architecture
5) Pig Components
6) Pig Data Model
7) Running Pig Commands and Pig Scripts (Log Analysis)
Highcharts and Elsevier share recent research into making interactive web charts more accessible. Our usability studies focused on three areas, including stacked column charts, scatter plots, and charts with drill-down interactivity. We will share design considerations for keyboard navigation and the understandability of non-visual representations of data visualizations.
What is Data Warehouse?OLTP vs. OLAP, Conceptual Modeling of Data Warehouses,Data Warehousing Components, Data Warehousing Components, Building a Data Warehouse, Mapping the Data Warehouse to a Multiprocessor Architecture, Database Architectures for Parallel Processing
Just finished a basic course on data science (highly recommend it if you wish to explore what data science is all about). Here are my takeaways from the course.
This document is the first deliverable of the Lean Big Data work package 7 (WP7). The main goal of the package 7 is to provide the use cases applications that will be used to validate the Lean Big Data platform. To this end, an analysis of requirement of each use case will be provided in the scope.This analysis will be used as basis for the description of the evaluation, benchmarking and validation of the Lean Big Data platform.
This deliverable comprises the analysis of requirements for the following case of study provided in the context of Lean Big Data: Data Centre monitoring Case Study, Electronic Alignment of Direct Debit transactions Case Study, Social Network-based Area surveillance Case Study and Targeted Advertisement Case Study.
La blockchain est la technologie buzzword du moment pour assurer une traçabilité des transactions. Beaucoup de personnes en parlent, mais très peu savent comment cela fonctionne réellement. Pire, de plus d'outils d'implémentations font leur apparition, et il est difficile de s'y retrouver.
The Linux Foundation a créé Hyperledger Fabric, un framework permettant la réalisation d'un réseau de blockchain from scratch très prometteur, que nous allons décortiquer ensemble. Nous allons donc réaliser un tour d'horizon de ce qu'il permet et explorer les tenants et aboutissants de ce qu'est concrètement la réalisation d'un tel réseau.
Cette présentation a pour vocation de vous permettre de comprendre en détail le fonctionnement de cette technologie, et surtout d'être capable de savoir si elle peut répondre à vos besoins.
Par Vincent Segouin, Développeur full-stack JS Java chez Xebia
Toutes les informations sur xebicon.fr
An Intro to NoSQL Databases -- NoSQL databases will not become the new dominators. Relational will still be popular, and used in the majority of situations. They, however, will no longer be the automatic choice. (source : http://martinfowler.com/)
Forecasted stock prices of Google using historical stock price data and sentiment scores using Sentiment Analyzer in Python from New York Times headlines, implemented different Time Series Models – ARIMA, Exponential Smoothing, Holtwinters, also used Sentiment Score regression models, Fb Prophet, also implemented Deep Learning Models
La visualisation est un élément important de la compréhension et de la (re)présentation des données dans les (data) sciences. Elle repose sur des principes et des outils que Christophe Bontemps (Toulouse School of Economics) décryptera à la lumière de son expérience et de ses lectures.
HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks.
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
This Edureka Pig Tutorial ( Pig Tutorial Blog Series: https://goo.gl/KPE94k ) will help you understand the concepts of Apache Pig in depth.
Check our complete Hadoop playlist here: https://goo.gl/ExJdZs
Below are the topics covered in this Pig Tutorial:
1) Entry of Apache Pig
2) Pig vs MapReduce
3) Twitter Case Study on Apache Pig
4) Apache Pig Architecture
5) Pig Components
6) Pig Data Model
7) Running Pig Commands and Pig Scripts (Log Analysis)
Highcharts and Elsevier share recent research into making interactive web charts more accessible. Our usability studies focused on three areas, including stacked column charts, scatter plots, and charts with drill-down interactivity. We will share design considerations for keyboard navigation and the understandability of non-visual representations of data visualizations.
Guidelines for data visualisation: eye vegetables and eye candyJen Stirrup
What's your data visualization vegetables? What's your candy? This session will look at data visualization theory and practice of hot data visualization topics such as: how can you choose which chart to choose and when?
How can you best structure your dashboard?
What about pie charts? What is the fuss about, and when are they best used?
Color blindness - how can you cater for the 1 out of 12 color blind males (and not forgetting the 1 out of 100 color blind females?)
To 3D or not to 3D? Why is it missing in Power View? And any other data visualization topics you care to mention! Come along for dataviz fun, and to learn the "why" along with practical advice.
This presentation was for Social Media Week Berlin on Tuesday, 24th September. It was targeted at NGOs, NPOs, activist organisations and charities who have important key messages to share with the community. The event will combine elements of a presentation and workshop. We will examine case studies of campaigns that have successfully used data visualisation in tandem with social media and content marketing techniques to spread information and ideas, and to counteract prevailing myths about climate change and renewable energy technology. We will then allow time for participants to split up into small working groups. Structured discussion tasks and group feedback will allow participants to investigate how these strategies can apply to their own organisation or issue. Participants will learn practical steps for identifying important messages, researching and developing content, incorporating data visualisation in a powerful and meaningful way, and promoting their data visualisation campaigns through social media and email outreach. In particular, the event will focus on developing powerful stories that will attract the support of influential sharers and thought leaders from a range of backgrounds, from activism through to industry, so as to maximise the campaign's reach and impact.
Principles and Practices of Data VisualizationKianJazayeri1
"Principles of Data Visualization" by Asst. Prof. Dr. Kian Jazayeri offers a deep dive into effective data representation techniques. The presentation begins by underlining the importance of data visualization in revealing true data insights, avoiding errors, and facilitating knowledge sharing. It challenges the viewer to think beyond basic charts, highlighting that effective visualization requires sophisticated skills to accurately convey complex information.
The deck uses Anscombe's Quartet to illustrate the misleading nature of statistics without proper visual representation, showcasing how different data distributions can look when graphed, despite having identical statistical summaries. This example sets the stage for discussing the necessity of visual analysis to uncover the real story behind the data.
Art appreciation parallels are drawn to emphasize the importance of visual aesthetics in data visualization. By comparing renowned artworks, the slides suggest that, like art, data visualization requires a developed sense of design and aesthetics to communicate effectively and make an impact.
Edward Tufte's visualization principles are explored in depth, advocating for a high data-ink ratio, and warning against the lie factor—where the representation of data misleads more than it informs. The presentation also addresses chartjunk, encouraging the removal of unnecessary visual elements that do not add value to the data's understanding.
Dr. Jazayeri emphasizes graphical integrity, advising against scale distortion and advocating for accurate, clear labeling to maintain the data's true proportion and context. The concept of aspect ratios is discussed, advising a balance to avoid visual misrepresentation of trends.
Interactive elements within the slides engage viewers, prompting them to analyze different visualizations and understand how quickly and accurately data can be interpreted. This engagement highlights the "10-Second Rule," the idea that effective visualizations should allow quick and unambiguous data interpretation.
Color usage in data visualization is another focal point, with explanations on how different colors and their intensities can significantly affect data interpretation. Special attention is given to designing for color blindness, ensuring inclusivity in data communication.
Advanced topics include data maps, cartograms, scatter plots, and heatmaps, each discussed with their specific applications and potential for overplotting or misinterpretation. The presentation also critiques tabular data, suggesting improvements for clarity, comparison, and highlighting critical information.
Renowned works, like Minard's depiction of Napoleon's Russian campaign and Marey’s train schedule, are dissected to demonstrate how effective visual storytelling can enhance the comprehension of complex data narratives.
Information and Scientific Visualization: Visualizing Safety Data to Drive Ef...CSSI_Inc
Effective visualization of data is crucial and allows for quick and efficient presentation of information when done correctly.This presentation will discuss how to effectively present data and ensure it is presented in a manner that accurately conveys the message intended by the author and meets the information the audience needs to make decisions.
Data Visualization: A Marketing Superpower - Clark BoydClark Boyd
Digital marketers have plenty of data. Understanding and communicating all that data is the real challenge.
Marketers that master data visualization will be highly valuable in future. This presentation walks you through the best practices and a 4-step process for simpler, more effective data visualizations.
Data Visualisation Design Workshop #UXbneCam Taylor
In this workshop we’ll explore both the art and science of communicating information graphically in the digital world.
With lots of great examples and a hands-on team exercise, the session is intended to make us think about how we can convey information more clearly and efficiently in our apps, presentations, reports, emails and other forms of communication.
AMIA 2015 Visual Analytics in Healthcare Tutorial Part 1David Gotz
A concise introduction to the topic of visualization. Designed for beginners with no prior experience with visualization. These slides were the first part of a half-day tutorial on Visual Analytics held in conjunction with the 2015 AMIA Annual Symposium. It was sponsored by the AMIA Visual Analytics Working Group. For more information, please see www.visualanalyticshealthcare.org or contact the author of the slides: David Gotz @ http://gotz.web.unc.edu
Joy Mountford at BayCHI: Visualizations of Our Collective LivesBayCHI
The lines between art, design, and information are dissolving as we experience new places and objects. Consider, for example, the organic flow of air traffic over North America at daybreak, the bursts of search query memes spreading around the globe, and the pointillist surge of mobile phone usage on New Year's Eve. Using the new techniques of generative data visualization, a new generation of artist/designers/engineer/scientists are creating gorgeous, dynamic experiences driven by massive sets of data about our own lives. Their work comes to life in architectural spaces, on walls of wood and metal and light and shimmering glass clouds suspended overhead. Of course it must be touched to be appreciated and engaged with, simple gestures launch a thousand images and possibilities. Many of these projects have received international recognition. They are primarily 3D applications that can run in real time, but really can only be appreciated by watching them, as movies. These data movies aim to make information easier to understand while being enjoyable to watch. Surprising insights surface through looking at our 'data life' in new ways, and may compel us to design in different, even better ways.
Why is it suboptimal to visualize data as plain figures? What is the purpose of data visualization? Why should you care? What is the interplay between statistics, data analysis, and a good marketing story? In this talk, I'll give some answers and try to convince you to adopt best practices in dataviz.
Researchers generally see quantitative methods as very different to qualitative methods. However with modern analytics and the advent of ‘big data’ our insight approaches are undergoing a shift.
The numbers of data points and activity measured in a typical digital product is growing, providing increasingly rich and complete sets of insightful data. This gives us the information we need to understand what people are doing and more importantly, why they are doing it.
However researchers are not typically asked to get involved with the design of analytics measures. Digital teams are lacking skills on how to extract qualitative meaning from analytics, opting for traditional qualitative research approaches that are not fully integrated with their existing user-base.
We took the challenge on and built our first Big Data System bringing Quant and Qual together.
Similar to Making an Impact With Data Visualization (20)
Are you interested in learning about text analysis but have little to no experience with programming languages or writing code? These two short courses will introduce you to multiple text analysis methods. We will examine real-world examples and engage in hands-on activities that don’t require running any code. These short courses are ideal for students and researchers in non-technical fields, faculty who would like to incorporate text analysis in their curriculum, or as a precursor to programming with text analysis tools.
We will continue to explore text analysis methods as introduced in A Gentle Introduction to Text Analysis I. Topics covered include Named Entity Recognition and Sentiment Analysis. There will also be a discussion on ethical problems in text analysis.
Are you interested in learning about text analysis but have little to no experience with programming languages or writing code? These two short courses will introduce you to multiple text analysis methods. We will examine real-world examples and engage in hands-on activities that don’t require running any code. These short courses are ideal for students and researchers in non-technical fields, faculty who would like to incorporate text analysis in their curriculum, or as a precursor to programming with text analysis tools.
A Gentle Introduction to Text Analysis I will cover both qualitative and quantitative text analysis methods, bag-of-words techniques and classification.
Identify the Best Sites for Your Research with ArchiveGridUNCResearchHub
Need help figuring out which archives hold the research material you need? This workshop will introduce researchers to ArchiveGrid, a database cataloging archive collections around the world. In addition, participants will learn other tips for formulating research questions and plan their research.
Making the Most of Your Time in an Archive: Archival Research Management, or...UNCResearchHub
Archival research can be exciting but also somewhat complex for a new researcher. This workshop draws on the expertise of long-time researchers to offer a roadmap for neophytes. Learn strategies for all stages of research: before you go; while you’re there; after the archive closes; and once you return home.
A discussion of the elements of a research question; general tips on the types of places likely to have data; and a range of things to consider when hunting data.
Turning Data into Infographics: An Interactive Workshop for Problem SolversUNCResearchHub
This workshop was given at the UNC Undergraduate Library on October 4, 2016. It steps through the process of finding data sources, exploring data, and ultimately creating a persuasive infographic using that data. A brief introduction to infographics and best practices are included.
Presentation by Dave Hansen of the UNC Law Library, March 27, 2015, about the scholarly communication issues around "orphan" works, those books or works of art or other copyrighted materials for which no author or rights-holding organization can be found.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
2. Why Visualize Data?
Data visualization is a tool that can help us explore and
understand complex patterns in large quantities of
data that cannot be directly perceived.
4. History of Data Viz: 17-1800’s
John SnowWilliam Playfair
• Playfair was the first to use line, bar, area and pie charts as visual symbols to represent data.
• Snow’s use of a dot map to show geographic densities of cholera victims lead to a new understanding
of how disease spreads.
Source: Wikimedia Commons Source: Wikimedia Commons
5. History of Data Viz: 17-1800’s
• Minard’s depiction of Napoleon's advance and retreat on Moscow is one of the first
data visualization dashboards.
• It represents several types and dimensions of data in multiple, related charts.
Charles Joseph Minard
Source: Wikimedia Commons
6. History of Data Viz: 17-1800’s
Length: Distance & Time
Width: Number of Men
Height: Temperature
7. History of Data Viz: 17-1800’s
• Social scientist, historian and
civil rights activist
• Strikingly modern presentations
of information well ahead of
their time
• ”An honest straightforward
exhibit of a small nation of
people, picturing their life and
development without apology
or gloss, and above all made by
themselves.”
W.E.B. Du Bois
8. History of Data Viz: 17-1800’s
Source: Du Bois, W., Battle-Baptiste, W., & Rusert, B. (2018). W.E.B. Du Bois’s data portraits : visualizing Black America : the color line at the turn of the
twentieth century. Amherst, Massachusetts: Princeton Architectural Press.
9. Data Viz Today
• The complexity and
variety of graphic
symbols often rivals that
of the data itself.
• Interactivity adds yet
another dimension.
• How are we able to
perceive and synthesize
so much information?
Source: Weather Underground
11. Human Perception
• The eye is drawn to certain features
and patterns that ”stand out” or
pop
• Information is subconsciously
obtained from our environment
before being attentively processed
(or not)
• Pattern recognition
Preattentive Processing
98187926383749819794961389746139
44978732184987621617995462132549
89796531859129939549719819295198
197354687929
98187926383749819794961389746139
44978732184987621617995462132549
89796531859129939549719819295198
197354687929
98187926383749819794961389746139
44978732184987621617995462132549
89796531859129939549719819295198
197354687929
How many 7’s do you see?
12. Human Perception
Visual Features That Affect Preattentive Processing
Source: Healey, C. G., & Enns, J. T. (2012). Attention and Visual Memory in Visualization and Computer Graphics. IEEE Trans. Visual. Comput. Graphics IEEE
Transactions on Visualization and Computer Graphics, 18(7), 1170-1188. doi:10.1109/tvcg.2011.127
13. Human Perception
Gestalt Principals
“The whole is different from the
sum of its parts.”
- Kurt Koffka
Source: http://www.fusioncharts.com/blog/2014/03/how-to-use-the-gestalt-principles-for-visual-
storytelling-podv/
14. Human Perception
Stimulus
Saccadic Eye Movements
• Eyes gather information by
flitting over subject
• 2-5 eye movements per second
Iconic Memory
• Stores preattentive information:
color, shape, size, etc.
• Lasts only about 0.3 seconds
Long-Term Memory
• Receives and processes
information from working memory
• Reprocesses information when
something familiar is encountered
Working Memory
• Holds information from
iconic memory
• Retains 3-5 objects at a time
Visual Memory
16. Selecting Visualizations
• Are you illustrating complex patterns
and/or large quantities of data?
• Are you answering a question, making an
argument, or telling a story?
• Will a visualization be more informative
than a simple table or text?
Step 1: Is a Visualization Necessary?
Source:
https://web.archive.org/web/20091119151328/http://www.governo
r.maryland.gov/documents/November09BPW.pdf
17. Selecting Visualizations
• How much does your audience know
about the research subject?
• How much does your audience know
about data analysis?
• What are the norms and expectations
in this field?
Step 2: Who is Your Audience?
?
18. Selecting Visualizations
Step 3: What do you want to show your audience?
I WANT TO… SOLUTION
Compare values Bar/Column Chart
Show distribution of values Scatter Plot
Show trends in distribution of values Line Chart
Demonstrate how proportions compare to the whole Pie Chart; Stacked Bar Chart
Show three variables at once Bubble Chart
19. Selecting Visualizations
• There are many, many types of charts to
choose from.
• Some require specialized knowledge to
interpret correctly.
• Some are simply misleading.
• Always keep your audience in
mind when making a decision.
What about other charts?
Source: Wikimedia Commons
Source: Wikimedia Commons
20. Selecting Visualizations
1. Should a visualization be used in this scenario?
2. Given the audience and the data, what is the best type of visualization to use?
You are a journalist for a national
paper. You are writing an article
about crime in the U.S.
Some believe that crime is on the
rise, while others argue that it has
diminished over the years.
The data tells a different story
depending on the type of crime,
geographic area and time period
that is being studied.
SCENARIO 1
You are a social media developer
for an ad agency. You provide one
of your clients with a weekly report
on brand activity.
Mentions of your client’s brand on
Twitter have decreased by 32% since
the previous week.
Sales have remained unchanged.
SCENARIO 2
You are a researcher studying
Parkinson’s disease.
You are publishing the
results of a study that uses
microarrays to measure gene
expression levels in mice.
Your dataset includes over
9,000 genes.
SCENARIO 3
23. Accuracy
• 1984 experiments by McGill and Cleveland
rank how accurately people assess
graphic depictions of data
• The rankings are useful but not
uncompromising
• Context and audience should also
be considered
Graphical Perception
Source: Cleveland, W. S., & Mcgill, R. (1984). Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods. Journal
of the American Statistical Association, 79(387), 531-554. doi:10.1080/01621459.1984.10478080
24. Accuracy
A B C A B C
0
1
2
3
4
5
6
7
8
9
0
2
4
6
8
10
12
14
16
18
A B C
1. Which chart looks better to you?
25. Accuracy
A B C A B C
0
1
2
3
4
5
6
7
8
9
0
2
4
6
8
10
12
14
16
18
A B C
1. Which chart looks better to you?
2. Which chart makes it easier to judge the difference between B and C?
27. Best Practices
When using column/bar charts,
always start the scale at 0.
• The column for 1996 appears to be twice
the height of the column for 1993.
• Yet, the axis labels tell us the difference
between the two years is only 2.5%
(65% versus 62.5%).
• This is a common distortion tactic.
Source: http://junkcharts.typepad.com/junk_charts/2014/04/when-to-use-the-
start-at-zero-rule-.html
28. Best Practices
Make sure scale is consistent and honest.
SAME DATA WITH 3 DIFFERENT SCALES DIFFERENT SCALES IN ONE GRAPH
Source: http://www.excelcharts.com/blog/redraw-troops-vs-cost-time-
magazine/
29. Best Practices
With line graphs, consider
using a separate chart to
demonstrate differences.
• Human perception defaults to the
shortest distance between two lines
rather than the vertical distance.
• A chart of the difference alone can be
more accurate and informative.
Vertical Distance
Shortest Distance
32. Best Practices
Watch your data-to-ink
ratio:
• Only use graphical elements that
are necessary for the chart to be
easily read by your audience.
• Once you’ve completed a
visualization, check to see if
there’s anything that can be
removed.
Source: https://www.youtube.com/watch?v=CJouyPxHb84
33. Best Practices
Classic Trick of the Trade: The Squint Test
Which elements “pop out” and catch your eye?
Are these the elements you want to draw attention to?
Source: http://blog.xlcubed.com/2008/08/the-dashbord-squint-test/
34. Best Practices
Creativity is good, but clarity is key.
Source: http://www.scmp.com/infographics/article/1284683/iraqs-bloody-toll
Source: http://www.businessinsider.com/gun-deaths-
in-florida-increased-with-stand-your-ground-2014-2
36. Color
Color isn’t always necessary.
• Many visualization tools add color
by default.
• Often a label on its own is enough.
• Color can be useful to distinguish
groups or intervals.
37. Color
Many people are color blind.
• Most people affected by color blindness
have diminished ability to differentiate red
and green.
• Don’t use color schemes that involve both.
• Vary the lightness/darkness/saturation of
colors as well. Check by printing or viewing in
grayscale.
Source: http://24ways.org/2012/colour-accessibility/
38. Color
Natural scales are more accurately interpreted than “rainbow” scales.
Source: Borland, D., & Ii, R. T. (2007). Rainbow Color Map (Still) Considered
Harmful. IEEE Comput. Grap. Appl. IEEE Computer Graphics and Applications, 27(2),
14-17. doi:10.1109/mcg.2007.323435
Source: http://spatial.ly
40. Color
Resources for Working With Color
• Color Brewer (colorbrewer2.org) helps you
select from a range of color scales that are
friendly to color blindness, printers, etc.
• Coblis (color-blindness.com/coblis-color-
blindness-simulator) allows you to upload
images and displays how they will appear to
someone who is colorblind.
41. Color
Color isn’t always the best way to group data.
• Often the same colors are perceived and
named differently, making it difficult to use
color as a guide for reference and discussion.
• Color Palette Analyzer (vis.stanford.edu/color-
names/analyzer) shows how often names for
different colors overlap.
• Alternative grouping methods include gestalt
principles described earlier as well as trellis
charts (a.k.a. panel charts or small multiples).
47. How can this visualization be improved?
Which incredible Canadian woman should be featured
on the $20 bill?
http://www.cbc.ca/radio/dnto/just-get-the-thing-done-and-let-them-howl-nellie-mcclung-1.3418924/which-incredible-canadian-woman-should-be-featured-on-the-20-bill-1.3419070
49. How can this visualization be improved?
Source: https://www.codot.gov/library/traffic/safety-crash-data/fatal-crash-data-city-county/historical_fatals.pdf/at_download/file
50. LOWEST YEAR HIGHEST YEAR
JAN 2002 2010
FEB 2004 2011
MAR 2004 2010
APR 2003 2011
MAY 2002 2010
JUN 2002 2009
JUL 2004 2010
AUG 2002 2011
SEP 2015 2014
OCT 2002 2013
NOV 2002 2013
DEC 2005 2006
0
30
60
90
Jan Feb Mar Apr May Jun Jul Aug
29
26
38 36
45
60
65
40
Colorado Fatal Crashes in 2016
Colorado Fatal Crashes by Month Since 2002
MIN AVG MAX
0
30
60
90
Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec
51. Key Concepts
• Visualizations are most informative when complex data is used to tell a story.
• Consider your audience at all times.
• Certain graphical elements can be more accurately perceived than others.
• Avoid mixing and manipulating scales. Be honest with your presentation.
• Consider which elements of your visualization “pop out” and attract the most attention.
• Avoid color schemes that conflict with red-green color blindness.
• Consider whether multiple charts will be more informative or easier to read than a single chart.
• Clarity is key.
57. Resources – Reading
Books
Tufte, Edward
• The Visual Display of Quantitative Information (2001)
• Visual Explanations (1997)
• Envisioning Information (1990)
Cleveland, William S
• Visualizing Data (1993)
• The Elements of Graphing Data (1985)
BLOGS
Flowing Data
flowingdata.com
Information is Beautiful
informationisbeautiful.net
Junk Charts
Junkcharts.typepad.com