The document discusses how humans see and understand data visualizations. It explains that visualizations should leverage the human visual system by using pre-attentive attributes like position, length, and angle. The most important data elements should be encoded using the highest ranked methods like position on a common scale. Other tips include avoiding pie charts and stacked bars, and leveraging principles from Gestalt psychology like continuity, proximity and closure to aid human pattern recognition and understanding. The goal of visualization is effective communication, so designs should optimize for human perception of variations in the data.
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
SQL is undoubtedly the most widely used language for data analytics for many good reasons. It is declarative,
many database systems and query processors feature advanced query optimizers and highly efficient execution engines, and last but not least it is the standard that everybody knows and uses. With stream processing technology becoming mainstream a question arises: “Why isn’t SQL widely supported by open source stream processors?”. One answer is that SQL’s semantics and syntax have not been designed with the characteristics of streaming data in mind. Consequently, systems that want to provide support for SQL on data streams have to overcome a conceptual gap. One approach is to support standard SQL which is known by users and tools but comes at the cost of cumbersome workarounds for many common streaming computations. Other approaches are to design custom SQL-inspired stream analytics languages or to extend SQL with streaming-specific keywords. While such solutions tend to result in more intuitive syntax, they suffer from not being established standards and thereby exclude many users and tools.
Apache Flink is a distributed stream processing system with very good support for streaming analytics. Flink features two relational APIs, the Table API and SQL. The Table API is a language-integrated relational API with stream-specific features. Flink’s SQL interface implements the plain SQL standard. Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite.
In this talk we present the future of Apache Flink’s relational APIs for stream analytics, discuss their conceptual model, and showcase their usage. The central concept of these APIs are dynamic tables. We explain how streams are converted into dynamic tables and vice versa without losing information due to the stream-table duality. Relational queries on dynamic tables behave similar to materialized view definitions and produce new dynamic tables. We show how dynamic tables are converted back into changelog streams or are written as materialized views to external systems, such as Apache Kafka or Apache Cassandra, and are updated in place with low latency. We conclude our talk demonstrating the power and expressiveness of Flink’s relational APIs by presenting how common stream analytics use cases can be realized.
Winning Kaggle 101: Introduction to StackingTed Xiao
An Introduction to Stacking by Erin LeDell, from H2O.ai
Presented as part of the "Winning Kaggle 101" event, hosted by Machine Learning at Berkeley and Data Science Society at Berkeley. Special thanks to the Berkeley Institute of Data Science for the venue!
H2O.ai: http://www.h2o.ai/
ML@B: ml.berkeley.edu
DSSB: http://dssberkeley.org
BIDS: http://bids.berkeley.edu/
Fabian Hueske - Stream Analytics with SQL on Apache FlinkVerverica
SQL is undoubtedly the most widely used language for data analytics for many good reasons. It is declarative,
many database systems and query processors feature advanced query optimizers and highly efficient execution engines, and last but not least it is the standard that everybody knows and uses. With stream processing technology becoming mainstream a question arises: “Why isn’t SQL widely supported by open source stream processors?”. One answer is that SQL’s semantics and syntax have not been designed with the characteristics of streaming data in mind. Consequently, systems that want to provide support for SQL on data streams have to overcome a conceptual gap. One approach is to support standard SQL which is known by users and tools but comes at the cost of cumbersome workarounds for many common streaming computations. Other approaches are to design custom SQL-inspired stream analytics languages or to extend SQL with streaming-specific keywords. While such solutions tend to result in more intuitive syntax, they suffer from not being established standards and thereby exclude many users and tools.
Apache Flink is a distributed stream processing system with very good support for streaming analytics. Flink features two relational APIs, the Table API and SQL. The Table API is a language-integrated relational API with stream-specific features. Flink’s SQL interface implements the plain SQL standard. Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite.
In this talk we present the future of Apache Flink’s relational APIs for stream analytics, discuss their conceptual model, and showcase their usage. The central concept of these APIs are dynamic tables. We explain how streams are converted into dynamic tables and vice versa without losing information due to the stream-table duality. Relational queries on dynamic tables behave similar to materialized view definitions and produce new dynamic tables. We show how dynamic tables are converted back into changelog streams or are written as materialized views to external systems, such as Apache Kafka or Apache Cassandra, and are updated in place with low latency. We conclude our talk demonstrating the power and expressiveness of Flink’s relational APIs by presenting how common stream analytics use cases can be realized.
Winning Kaggle 101: Introduction to StackingTed Xiao
An Introduction to Stacking by Erin LeDell, from H2O.ai
Presented as part of the "Winning Kaggle 101" event, hosted by Machine Learning at Berkeley and Data Science Society at Berkeley. Special thanks to the Berkeley Institute of Data Science for the venue!
H2O.ai: http://www.h2o.ai/
ML@B: ml.berkeley.edu
DSSB: http://dssberkeley.org
BIDS: http://bids.berkeley.edu/
Working with Fashion Models - PyDataLondon 2016Eddie Bell
PyDataLondon 2016 presentation
Fashion is a visual medium so it makes sense for our models of fashion to include visual features. In this presentation, I'll describe how we've build a general purpose visual fashion representation using CNNs. The network is multi-task (multiple labels per image), multi-image (multiple images per label) and it runs on multiple GPUs.
I'll visually explore what is going on inside the black box of a neural network and discover how a fashion specific model sees the world differently from generic visual models. Lastly, I'll demonstrate a multi-modal applications of the representation learned by the model.
These slides use concepts from my (Jeff Funk) course entitled Biz Models for Hi-Tech Products to analyze the business model for Kaggle’s Crowd Sourcing Service for Data Analytics. Kaggle connects data scientists with organizations who have problems related to data analysis. Kaggle helps organizations define their data analytic problems, present them to data scientists, and organize and evaluate competitions between data analytic solutions. Its data ensemble technique also evaluates the effectiveness of the various solutions. These slides describe the specific value proposition for organizations and data scientists and other aspects of the business model such as the method of value capture, scope of activities, and method of strategic control.
See 2020 update: https://derwen.ai/s/h88s
SF Python Meetup, 2017-02-08
https://www.meetup.com/sfpython/events/237153246/
PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics.
A snapshot of internet, social media, and mobile use in every country in the world. This report is part of a suite of reports brought to you by We Are Social and Hootsuite - read the other reports for free at http://www.slideshare.net/wearesocialsg/presentations
The Marketer's Guide To Customer InterviewsGood Funnel
A step-by-step guide on how to doing customer interviews that reveal revenue-boosting insights. This deck is made exclusively for marketers & copywriters.
Buyers no longer use voicemails and emails from strangers to learn about products. This information is online, whenever buyers are interested. This SlideShare presentation show sellers how to connect in a meaningful way by starting conversations around the buyer’s plans, goals and challenges.
This presentation is one class in HubSpot Academy's free sales training course. You can enroll here: http://certification.hubspot.com/inbound-sales-certification
Modern Prospecting Techniques for Connecting with Prospects (from Sales Hacke...HubSpot
Sales is a difficult world to be in because buyers aren't putting up with salespeople anymore. Instead of helping and building relationships, sales reps are still focused on closing prospects - even when they aren't ready to buy! So buyers ignore them. Because of that, even great sales reps would be lucky to get on the phone with someone.
While buyers have evolved and become more sophisticated, sales reps and training programs have been slow to adapt to that change.
Learn actionable modern prospecting techniques you can apply immediately from two best selling authors and sales experts: Max Altschuler CEO of Sales Hacker, and Mark Roberge CRO of HubSpot.
Class 1: Email Marketing Certification course: Email Marketing and Your BusinessHubSpot
*From HubSpot Academy*
Over the past few decades, people have radically changed the way they live, work and buy. This class will give you an overview of an adaptive, inbound approach to sending emails that provide value and drive growth for your business. It will also teach you about the four big themes of a modern email marketing program: segmentation, personalization, mobile, and optimization.
Guidelines for data visualisation: eye vegetables and eye candyJen Stirrup
What's your data visualization vegetables? What's your candy? This session will look at data visualization theory and practice of hot data visualization topics such as: how can you choose which chart to choose and when?
How can you best structure your dashboard?
What about pie charts? What is the fuss about, and when are they best used?
Color blindness - how can you cater for the 1 out of 12 color blind males (and not forgetting the 1 out of 100 color blind females?)
To 3D or not to 3D? Why is it missing in Power View? And any other data visualization topics you care to mention! Come along for dataviz fun, and to learn the "why" along with practical advice.
AMIA 2015 Visual Analytics in Healthcare Tutorial Part 1David Gotz
A concise introduction to the topic of visualization. Designed for beginners with no prior experience with visualization. These slides were the first part of a half-day tutorial on Visual Analytics held in conjunction with the 2015 AMIA Annual Symposium. It was sponsored by the AMIA Visual Analytics Working Group. For more information, please see www.visualanalyticshealthcare.org or contact the author of the slides: David Gotz @ http://gotz.web.unc.edu
Working with Fashion Models - PyDataLondon 2016Eddie Bell
PyDataLondon 2016 presentation
Fashion is a visual medium so it makes sense for our models of fashion to include visual features. In this presentation, I'll describe how we've build a general purpose visual fashion representation using CNNs. The network is multi-task (multiple labels per image), multi-image (multiple images per label) and it runs on multiple GPUs.
I'll visually explore what is going on inside the black box of a neural network and discover how a fashion specific model sees the world differently from generic visual models. Lastly, I'll demonstrate a multi-modal applications of the representation learned by the model.
These slides use concepts from my (Jeff Funk) course entitled Biz Models for Hi-Tech Products to analyze the business model for Kaggle’s Crowd Sourcing Service for Data Analytics. Kaggle connects data scientists with organizations who have problems related to data analysis. Kaggle helps organizations define their data analytic problems, present them to data scientists, and organize and evaluate competitions between data analytic solutions. Its data ensemble technique also evaluates the effectiveness of the various solutions. These slides describe the specific value proposition for organizations and data scientists and other aspects of the business model such as the method of value capture, scope of activities, and method of strategic control.
See 2020 update: https://derwen.ai/s/h88s
SF Python Meetup, 2017-02-08
https://www.meetup.com/sfpython/events/237153246/
PyTextRank is a pure Python open source implementation of *TextRank*, based on the [Mihalcea 2004 paper](http://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf) -- a graph algorithm which produces ranked keyphrases from texts. Keyphrases generally more useful than simple keyword extraction. PyTextRank integrates use of `TextBlob` and `SpaCy` for NLP analysis of texts, including full parse, named entity extraction, etc. It also produces auto-summarization of texts, making use of an approximation algorithm, `MinHash`, for better performance at scale. Overall, the package is intended to complement machine learning approaches -- specifically deep learning used for custom search and recommendations -- by developing better feature vectors from raw texts. This package is in production use at O'Reilly Media for text analytics.
A snapshot of internet, social media, and mobile use in every country in the world. This report is part of a suite of reports brought to you by We Are Social and Hootsuite - read the other reports for free at http://www.slideshare.net/wearesocialsg/presentations
The Marketer's Guide To Customer InterviewsGood Funnel
A step-by-step guide on how to doing customer interviews that reveal revenue-boosting insights. This deck is made exclusively for marketers & copywriters.
Buyers no longer use voicemails and emails from strangers to learn about products. This information is online, whenever buyers are interested. This SlideShare presentation show sellers how to connect in a meaningful way by starting conversations around the buyer’s plans, goals and challenges.
This presentation is one class in HubSpot Academy's free sales training course. You can enroll here: http://certification.hubspot.com/inbound-sales-certification
Modern Prospecting Techniques for Connecting with Prospects (from Sales Hacke...HubSpot
Sales is a difficult world to be in because buyers aren't putting up with salespeople anymore. Instead of helping and building relationships, sales reps are still focused on closing prospects - even when they aren't ready to buy! So buyers ignore them. Because of that, even great sales reps would be lucky to get on the phone with someone.
While buyers have evolved and become more sophisticated, sales reps and training programs have been slow to adapt to that change.
Learn actionable modern prospecting techniques you can apply immediately from two best selling authors and sales experts: Max Altschuler CEO of Sales Hacker, and Mark Roberge CRO of HubSpot.
Class 1: Email Marketing Certification course: Email Marketing and Your BusinessHubSpot
*From HubSpot Academy*
Over the past few decades, people have radically changed the way they live, work and buy. This class will give you an overview of an adaptive, inbound approach to sending emails that provide value and drive growth for your business. It will also teach you about the four big themes of a modern email marketing program: segmentation, personalization, mobile, and optimization.
Guidelines for data visualisation: eye vegetables and eye candyJen Stirrup
What's your data visualization vegetables? What's your candy? This session will look at data visualization theory and practice of hot data visualization topics such as: how can you choose which chart to choose and when?
How can you best structure your dashboard?
What about pie charts? What is the fuss about, and when are they best used?
Color blindness - how can you cater for the 1 out of 12 color blind males (and not forgetting the 1 out of 100 color blind females?)
To 3D or not to 3D? Why is it missing in Power View? And any other data visualization topics you care to mention! Come along for dataviz fun, and to learn the "why" along with practical advice.
AMIA 2015 Visual Analytics in Healthcare Tutorial Part 1David Gotz
A concise introduction to the topic of visualization. Designed for beginners with no prior experience with visualization. These slides were the first part of a half-day tutorial on Visual Analytics held in conjunction with the 2015 AMIA Annual Symposium. It was sponsored by the AMIA Visual Analytics Working Group. For more information, please see www.visualanalyticshealthcare.org or contact the author of the slides: David Gotz @ http://gotz.web.unc.edu
Using effective visual aids is important for getting across your message when describing data. This can be in a presentation, poster or paper. This talk goes through some basic design tips that can help your visual aids look professional and work effectively.
Written for the Enabling Excellence ETN. https://eetraining.wordpress.com/
Best Practices for Killer Data VisualizationQualtrics
There’s something special about simple, powerful visualizations that tell a story. In fact, 65% of people are visual learners.
Join Qualtrics and Sasha Pasulka from Tableau as we illuminate the world of data visualization and give you clear takeaways to help you tell a better story with data. Getting executive buy-in or that seat at the table may come down to who can visualize data in a way that excites and enlightens the audience.
Data Visualization dataviz superpower! Guidelines on using best practice data visualization principles for Power BI, Excel, SSRS, Tableau and other great tools!
Visualizing and Communicating High-dimensional DataStefan Kühn
Slides from my talk at Data Natives, starting with the different Modes of Perception, the components of Visualization and Graphics and how to transport Information efficiently, then giving examples of how modern approximation techniques - manifold learning, principal curves - and visualization techniques - pair plots, correlation plots, parallel coordinates, grand tour - can be used in order to approach complex multi-dimensional data.
Presented at #H2OWorld 2017 in Mountain View, CA.
Enjoy the video: https://youtu.be/bas3-Ue2qxc.
Learn more about H2O.ai: https://www.h2o.ai/.
Follow @h2oai: https://twitter.com/h2oai.
- - -
Abstract:
Auto Visualization involves the problem of producing meaningful graphics when presented with data. Relevant to this task are the strategies that expert statisticians and data analysts use to gain insights through visualization, as well as the portfolio of diagnostic methods devised by statisticians in the last 50 years. While some researchers and companies may claim to do automatic visualization, the problem is much deeper than simply producing collections of histograms, bar charts, and scatterplots. The deeper problem is what subset of these graphics is critical to recognizing anomalies, outliers, unusual distributions, missing values, and so on. This talk will cover aspects of this deeper problem and will introduce H2O software that implements some of these algorithms.
Leland Wilkinson is Chief Scientist at H2O.ai and Adjunct Professor of Computer Science at the University of Illinois Chicago. He received an A.B. degree from Harvard in 1966, an S.T.B. degree from Harvard Divinity School in 1969, and a Ph.D. from Yale in 1975. Wilkinson wrote the SYSTAT statistical package and founded SYSTAT Inc. in 1984. After the company grew to 50 employees, he sold SYSTAT to SPSS in 1994 and worked there for ten years on research and development of visualization systems. Wilkinson subsequently worked at Skytree and Tableau before joining H2O.ai. Wilkinson is a Fellow of the American Statistical Association, an elected member of the International Statistical Institute, and a Fellow of the American Association for the Advancement of Science. He has won best speaker award at the National Computer Graphics Association and the Youden prize for best expository paper in the statistics journal Technometrics. He has served on the Committee on Applied and Theoretical Statistics of the National Research Council and is a member of the Boards of the National Institute of Statistical Sciences (NISS) and the Institute for Pure and Applied Mathematics (IPAM). In addition to authoring journal articles, the original SYSTAT computer program and manuals, and patents in visualization and distributed analytic computing, Wilkinson is the author (with Grant Blank and Chris Gruber) of Desktop Data Analysis with SYSTAT. He is also the author of The Grammar of Graphics, the foundation for several commercial and opensource visualization systems (IBMRAVE, Tableau, Rggplot2, and PythonBokeh).
This slide deck is from a workshop that took place at the UNC Chapel Hill Davis Library Research Hub.
Collecting data is now easier than it has ever been. But, as data becomes more prolific, datasets become larger and more complex. How do we find meaningful patterns in our data? How can we communicate those patterns to others? Data visualization allows us to make sense of today’s ever evolving information landscape.
This workshop will introduce the history and basic principles of data visualization. Learn about best practices and resources for making an impact with your data through compelling charts, graphs and maps.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
31. The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
32. The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
33. The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
34. “The first rule of color:
do not talk about color!”
- Tamara Munzner
45. The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
46.
47.
48.
49.
50.
51. The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
52.
53.
54.
55. The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
56.
57.
58.
59.
60. The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
65. The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned
scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
66.
67.
68.
69. The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
70.
71.
72. The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
85. Piecharts are the information visualization
equivalent of a roofing hammer to the
frontal lobe. They have no place in the world
of grownups, and occupy the same semiotic
space as short pants, a runny nose, and
chocolate smeared on one’s face. They are
as professional as a pair of assless chaps.
http://blog.codahale.com/2006/04/29/google-analytics-the-goggles-they-do-nothing/
86. Piecharts are the information visualization
equivalent of a roofing hammer to the frontal
lobe. They have no place in the world of
grownups, and occupy the same semiotic
space as short pants, a runny nose, and
chocolate smeared on one’s face. They are
as professional as a pair of assless chaps.
http://blog.codahale.com/2006/04/29/google-analytics-the-goggles-they-do-nothing/
87. The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
88.
89.
90. Tables are preferable to graphics for many small
data sets. A table is nearly always better than a
dumb pie chart; the only thing worse than a pie
chart is several of them, for then the viewer is
asked to compared quantities located in spatial
disarray both within and between pies… Given
their low data-density and failure to order
numbers along a visual dimension, pie charts
should never be used.
-Edward Tufte, The Visual Display of Quantitative Information
91. Tables are preferable to graphics for many
small data sets. A table is nearly always better
than a dumb pie chart; the only thing worse than
a pie chart is several of them, for then the viewer
is asked to compared quantities located in spatial
disarray both within and between pies… Given
their low data-density and failure to order
numbers along a visual dimension, pie charts
should never be used.
-Edward Tufte, The Visual Display of Quantitative Information
92. Clinton Trump
Among Democrats 99% 1%
Among Republicans 53% 47%
Who do you think did a better
job in tonight’s debate?
116. The most important measurement should exploit
the highest ranked encoding possible.
• Position along a common scale
• Position on identical but nonaligned scales
• Length
• Angle or Slope
• Area
• Volume or Density or Color saturation
• Color hue
117. Cleveland’s three visual operations
of pattern perception:
1. Detection
2. Assembly
3. Estimation
191. Q: Should I include 0 on my scale?
A: It depends.
192. Q: Should I include 0 on my scale?
A: Relying on the pre-attentive
perception of size or intensity?
Yes, otherwise you will mislead.
Using position? It’s up to you.
200. “Above all else, show
the variation in the data.”
-Rauser (via Tufte)
201. R/GGplot2 code for every plot in this
presentation available at http://goo.gl/xH5PLV
The rendered document is at
http://rpubs.com/jrauser/hhsd_notes
This presentation is at
https://goo.gl/LuDNje
I will tweet these links as @jrauser
213. R/GGplot2 code for every plot in this
presentation available at http://goo.gl/xH5PLV
The rendered document is at
http://rpubs.com/jrauser/hhsd_notes
This presentation is at
https://goo.gl/LuDNje
I will tweet these links as @jrauser