Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Creating Effective Data
Visualizations in Excel 2016:
Some Basics
SIDLIT 2017
Aug. 3 – 4, 2017
Presentation Overview
• One of the mainstays of a modern software toolkit is Excel 2016, from
Microsoft Office 2016. By re...
Presentation Overview(cont.)
• In this session, participants will
• review how to load a data table,
• read the general da...
Presentation Order
• Sourcing Datasets
• Reading General Data in a Data Table / Worksheet
• Processing or Cleaning Data
• ...
Presentation Order (cont.)
• Adding Relevant Data Visualization Elements
• Processing Graph Visualizations Outside of Exce...
Sourcing Datasets
6
Sourcing Datasets
• Downloading public datasets from sites like data.gov
• Capturing the back data about how the publicall...
Sourcing Datasets (cont.)
• Downloading data from social media platforms
• These may include Facebook poststreams, Twitter...
Sourcing Datasets (cont.)
• Autogenerating data…
• From online research suites (often used to test surveys)
• From graph v...
Data Analytics Suites
• Some datasets may be exported from data analytics suites.
• SPSS, RapidMiner Studio, R, Python, an...
Data Capture and Pre-Processing
• Prior to importing data into Excel, it is likely that the data is pre-processed /
cleane...
Data Everywhere and Fungible
• In other words, it is possible to datafy a lot of things.
• There is data everywhere…
• It ...
Reading General Data in a Data
Table / Worksheet
13
Structured Data
Structured data is labeled by row and column
headers
Such data is categorize-able by type and
common chara...
“Unstructured” or
“Semi-Structured”
Data
Text sets, bags of words
Image files
Audio files
Video files
Multimedia, and othe...
Basic “Structured” Data Structures
• Column A tends to contain unique identifiers for the row data
• Row 1 tends to contai...
Coding “Structured” Data
• Structured data generally has a long history of conventional statistical
approaches to analysis...
Basic “Unstructured” and “Semi-Structured”
Data Structures
• Language data tends to have an inherent structure based on ho...
Coding “Unstructured” and “Semi-Structured”
Data
• So-called “unstructured” or “semi-structured” data are coded in a
varie...
Coding “Unstructured” and “Semi-Structured”
Data (cont.)
• Such unstructured / semi-structured data are multi-dimensional,...
Coding “Unstructured” and “Semi-Structured”
Data (cont.)
• There are some text corpora which are non-consumptive, which
me...
Coding “Unstructured” and “Semi-Structured”
Data (cont.)
• Such data may be coded by humans alone, computer alone, or a
cy...
23
24
25
Unlinked or Linked Data Tables
Flat Files
• Data tables treated as single
stand-alone files that may be
assessed alone or ...
Processing or Cleaning Data
27
Some Common Questions for Data Processing
or Cleaning
Structured Data
• How should missing data be
handled? (Should empty ...
Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• In a set of parametric data, how
should ex...
Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• There may be benefits to
combining multipl...
Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• The original labeling of online
data from ...
Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• In combining manual coding for
a team code...
Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• An online survey system has
accidentally c...
Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• In many fields, original datasets
have to ...
Some Common Questions for Data Processing
or Cleaning (cont.)
Structured Data
• In some cases, conceptual data
may be appl...
General Points about Data Processing /
Cleaning
• There should be clear principles and rationales for how data is
handled....
General Points about Data Processing /
Cleaning(cont.)
• There should be clear steps and processes applied to data process...
Using the “Recommended
Charts” Feature in Excel 2016
38
Accessing the “Recommended Charts”
Feature
• To access this feature, highlight the desired data to map (from the
dataset),...
About the “Recommended Charts” Feature
• This “recommended charts” feature offers some cognitive scaffolding
to new data v...
Selecting the Right Amount of Data to Map
• Every selected cell of data—even the empty ones—contain meaning in the data
vi...
Accessing the “All Charts” Feature
• There is one tab that offers some “Recommended Charts”. The tab
next to it offers “Al...
The “Recommended Charts” Feature
• This Excel feature assesses the types of data in the dataset or
worksheet and proposes ...
The “Recommended Charts” Feature (cont.)
• If too much data have been highlighted, then a message will be
shown. It will r...
The “Recommended Charts” Feature (cont.)
• This feature is a generalized one and does not include deep or unique
or inside...
46
47
48
49
0
10
20
30
40
50
60
70
80
90
100
Analytic Clout Authentic Tone
PercentileRankings
Four Text Feature Scores
Four Text An...
50
Abstracting Core Descriptive Functions in
Data Visualizations
• Proportionality (“intensity”)
• Frequency counts
• Pie cha...
Abstracting Core Descriptive Functions in
Data Visualizations (cont.)
• Social relationships
• Intercommunications, follow...
Abstracting Core Analytical Functions in Data
Visualizations: Deductive, Inductive, Inferential
• Data relationships
• Ass...
Filtering Data
• The “Sort & Filter” option enables users to select a column or
segment of a column to alphabetize or sort...
Filtering Data (cont.)
• A “Sort Warning” window asks whether the user wants to “Expand
the selection” or just “Continue w...
Selecting Data Visualization Types
56
What Follows
• In the following section are the main types of data graphs enabled in
Excel with its built-in charting feat...
Column Charts
• Column graphs may tend to be vertical (vs. horizontal).
• In other words, they tend to align with the plac...
59
Data Structure for the Vertical Column or Bar
Chart on the Prior Slide
pronou
n
ppron i we you shehe they ipron article pr...
61
Data Structure for the Stacked Bar Chart in
the Prior Slide
Very Negative Moderately
Negative
Moderately Positive Very Pos...
63
Group Selfies Dronies
Babies 7 3
Children 63 21
Teens 20 9
20s and 30s 943 168
40s 49 10
50s 25 7
60s 12 2
70s and olde...
Line Charts
• Line graphs tend to be horizontal
• Line graphs may represent changes over time
• In such cases, time is rep...
Line Charts (cont.)
• Line graphs may have two different variables with one represented on the
x-axis and one on the y
• T...
66
Month Count
Sept. 2015 83
Oct. 2015 222
Nov. 2015 51
Dec. 2015 15
Jan. 2016 4
Feb. 2016 2
67
HLY-TEMP-NORMAL HLY-DEWP-NORMAL
34.9 28.9
34.4 28.7
33.9 28.4
33.4 28.3
33.1 28
32.7 27.9
32.5 27.7
32.3 27.6
32.1 27.4...
68
Very Negative Moderately Negative Moderately Positive Very Positive
SpaceX Public Group FB
210 305 399 245
Tesla Motors...
Pie Charts
• Pie charts are used to represent (related) proportions of a whole.
Proportions are determined numerically—by ...
70
creation_pending 65
deleted 2112
pre_registered 13795
registered 84351
100323
71
DELETE (Remove) 163
GET (Read) 824038
HEAD (Retrieve Resource) 204
PATCH (Update, Modify) 12
POST (Create) 218950
PUT (...
72
Course 32936
Group 14576
47512
Bar Charts
• Bar charts use rectangular shapes (and the sizes of these shapes) to
indicate quantities and intensities.
• B...
74
auto_graded 239430
human_graded 1070322
not_graded 787398
2097150
75
Data Structure for the 100% Bar Chart on the
Prior Slide
Very Negative Moderately
Negative
Moderately Positive Very Positi...
77
Data Structure and Source for Stacked Bar
Charts in Prior Slide
• Data from “Comparative Analysis of 4-H Enrollment and U....
79
Main Time Zones
Alaska 3
Arizona 7
Asuncion 1
Auckland 1
Baghdad 1
Bangkok 1
Beijing 1
Bogota 1
Brasilia 2
Bucharest 1
...
Area Charts
• Area charts are built from line charts.
• In these, the areas under the respective lines are filled in with ...
81
Null 769,964
1 246,371
2 20,907
3 6749
4 1762
5 1,239
6 368
7 212
8 309
9 138
10 104
11 39
12 20
13 32
14 24
15 23
16 2...
82
Group Selfies Dronies
Babies 7 3
Children 63 21
Teens 20 9
20s and 30s 943 168
40s 49 10
50s 25 7
60s 12 2
70s and olde...
83
Year (All)
Data
Region State Detail of Male Detail of Female Detail of Youth Detail of 4H Units
CENTRAL Illinois 204,15...
84
Group Selfies Dronies
Babies 7 3
Children 63 21
Teens 20 9
20s and 30s 943 168
40s 49 10
50s 25 7
60s 12 2
70s and olde...
85
Date selfie selfie guy
1/1/2004 -0.583 -0.608
2/1/2004 -0.583 -0.608
3/1/2004 -0.583 -0.608
4/1/2004 -0.583 -0.608
5/1/...
Data Structure and
Source for Area Chart
in Prior Slide
Comparison of search frequencies for “selfie”
and “selfie guy” on ...
X Y (Scatter) Charts
• Scatter graphs (aka scatter plots or scatter diagrams) capture two sets
of point data.
• On the res...
88
open high low close
182.53 183.4 182.53 183.385
181.75 182.46 181.61 182.06
179.42 180.93 179.42 180.38
178.74 179.82 1...
89
http://www.nasdaq.com/symbol/pg/historical
Data Structure of the Scatter Graph from the
Prior Slide
90
date close volume open high low
15:26 90.13 5,975,886 89.55 90...
91
Data Structure of the Scatter Graph from the
Prior Slide
92
date close volume open high low
16:00 30.61 6,225,686 30.47 30...
Stock Charts
• Stock graphs (sometimes referred to as OHLC or “open high low
chart”) show the ups and downs in stock valua...
Stock Charts(cont.)
• The three examples were created from the online Nasdaq historical
data site. Their “quotes” tab enab...
95
date open high low close
10:07 182.53 183.4 182.53 183.385
4/24/2017 181.75 182.46 181.61 182.06
4/21/2017 179.42 180.9...
96
date open high low close
10:24 865 867.5 862.81 866.64
4/24/2017 851.2 863.45 849.86 862.76
4/21/2017 842.88 843.88 840...
97
date volume open high low close
10:39 1,740,869 308 309.25 305.86 309.06
4/24/2017 5077771 309.22 310.55 306.0215 308.0...
Surface Charts
• Surface graphs are 3-dimensional (3D) graphs with x, y, and z axes.
• The setup for a surface graph requi...
Surface Charts(cont.)
The data should be structured as a matrix or
what some call a “mesh” because this
information will b...
Surface Charts (cont.)
• Surface charts enable the visualizing of some interaction between the
data represented in the x-a...
Surface Charts (cont.)
• 3D visualizations are difficult for people to use because data may be
occluded or difficult to se...
102
103
104
105
Data Structure for the Prior Four Surface
Graphs (a selection of data)
106
308 309.22 302 306.51 302.46 299.7 302.7 296.7 ...
Radar Charts
• Radar graphs, also known as spider graphs / charts, show quantitative
measures on axes emanating from a cen...
108
insight cause discrep tentat certain differ
2.62 3.11 0.91 2.32 0.99 2.89
109
affiliation achieve power reward risk
1.86 1.93 3.05 0.73 0.45
110
see hear feel
0.61 0.35 0.21
111
Analytic Clout
Authenti
c Tone
Area
chart -
Wikipedi
a.pdf 97.06 51.71 29.09 58.03
Bar chart
-
Wikipedi
a.pdf 97.45 53...
Treemap Charts
• Treemap diagrams are rectangular diagrams which convey frequency
in terms of spatial area of smaller rect...
113
assignment 61385
graded_survey 1123
practice_quiz 2962
survey 896
66366
114
Word Count
9465008123 5035
amazon 2916
2017 2861
https 2712
com 2063
just 873
like 783
get 771
1015484624218312
4
734
...
115
Very negative
Moderately
negative
Moderately
positive Very positive
1 :
InternalsA
mazon
(@amazon) ~
Twitter 37 73 176...
Sunburst Charts
• Sunburst diagrams originated from piecharts. In sunburst diagrams,
variables are depicted as portions of...
117
Data Structure of the
Sunburst Diagram in
the Prior Slide
Nodes Sub-nodes No. Coding
References
account account access 7
a...
119
Name Sources References
beautiful 1 782
day 1 4
employment 1 8
event 1 8
everyone 1 4
flags 1 5
friendly reminder 1 12...
120
✔ ✔ apps 1
✔ ✔ game 1
✔ ✔ income jaction 1
✔ ✔play store 1
delivery date estimated delivery date 8
delivery date false...
Histogram Charts
• Histogram charts shows the frequency distribution of numerical data
over the comprehensive range of pos...
122
123
Data Structures for the Two Related
Histograms in the Prior Two Slides
124
Bins
Group Selfies
Frequencies Bins
Dronies
Fre...
125
Data Structure for the Theme Histogram in
the Prior Slide
126
A :
compan
y
B :
engine
C :
engineer
ing
D :
landing
E :
lau...
Box & Whisker Charts
• Box and whisker diagrams enable the visualization of groups of numerical
data in quartiles (data br...
Box & Whisker Charts(cont.)
• Skewness shows what the tendency is so whether there are more
scores that trend high or tren...
129
Data Structure for the Box & Whisker Plot in
the Prior Slide (partial snippet)
130
YearStart YearEnd
LocationA
bbr
Locatio...
131
Data Structure for the Box & Whisker Plot in
the Prior Slide (partial snippet)
132
Hospital
Referral
Region
Descriptio
n
T...
Waterfall Charts
• Waterfall diagrams (aka “flying bricks chart” or “Mario chart,” or
“bridge” in finance) capture interme...
Waterfall Charts (cont.)
• This graph displays “the cumulative effect of sequentially introduced
positive or negative valu...
135
Data Structure for the Waterfall Chart in the
Prior Slide
136
Base Fall Rise Total
4/24/2017 30.35 0
4/21/2017 30.7 0 30.7...
137
Data Structure for the Waterfall Chart in the
Prior Slide
138
Dates Base Fall Rise
Total
Changes
4/3/2017 3.95 0 0 0
4/4/2...
Combo Chart
• Combination graphs are those which mix data and present the
findings in creative interlinked ways (optimally...
140
Data Structure for the Combo Chart in the
Prior Slide
141
function pronoun ppron i we you shehe they ipron article prep au...
3D Maps Geographical Imagery
• The 3D Maps imagery is related to locational mapping on a digital 3D
globe.
• There should ...
3D Maps Geographical Imagery (cont.)
• To set up data for 3D imagery, set up some locations: city,
state/province, country...
144
Data Structure for the 3D Image in the Prior
Slide
City State Country Years of Residence
145
Some Tips for Creating Data Visualizations in
Excel 2016
• Do a mental walk-through of the underlying data.
• Consider wha...
Going “Off-Script” within Excel
Going with data visualization templates in Excel is a very fast way to portray
structured ...
(1) A Composite Multi-Graph Image
• Let’s say that there is a need to create multiple graphs that are
interrelated and nee...
(2) Back-to-Back Bar Charts
• Begin with a set of relatively comparable data with the same variables
being compared (with ...
(2) Back-to-Back Bar Charts (cont.)
• Create a name label for the data visualization using a text box.
• For one of the tw...
(2) Back-to-Back Bar Charts (cont.)
• Add a white background to the image, so that the Excel cells do not
show up.
• If fu...
(2) A Rough Example of a Back-to-Back Bar
Chart
152
(3) A Stacked Pyramid Chart
• Create a list of frequency data.
• Highlight the frequency data, and filter from largest to ...
(3) A Stacked Pyramid Chart (cont.)
• With the chart highlighted, go to the Design tab, and click “Switch
Row/Column.” The...
(3) A Stacked Pyramid Chart (cont.)
• Right-click one of the placeholder layers in the visualization, and go to
the Format...
(3) A Stacked Pyramid Chart
156
Some Common Mistakes
157
Some Common Mistakes
• Not ensuring that the underlying data behind a data visualization is
correct
• A lack of alignment ...
Some Common Mistakes(cont.)
• An incoherent data visualization enabling a wide variety of
misinterpretations (or conflicti...
Some Common Mistakes(cont.)
• Excess data in the data visualization (such as extra decimal places for
whole numbers for a ...
One Main Realization
• The work to conduct the research and to acquire the actual data takes
about 95% of the effort and t...
Adding Relevant Data
Visualization Elements
Data visualizations should be as simple as possible, with no extraneous elemen...
Common Data Visualization Elements
• A clear noun-phrase title
• Labels for the x- and y-axes (and sometimes y1 and y2
axe...
Graph Styles
• Various style versions of the target graph
• Background styles
• Object handling
• Texturing of objects and...
Range of Color Palettes
• Ability to add a variety of colors in
palettes that are aesthetically
pleasing and of sufficient...
To Change Graph Colors…
• To change the colors of the plot, highlight the
plot.
• In the Design tab of the ribbon, select ...
To Select Custom Colors…
• Custom colors may be applied to particular elements. Just right click
the element, and change t...
168
Dropdown Menus with Additional Options
• Users have a high level of
control for the look, feel, and
function of the chart ...
MS Excel’s Page Layout Features
• Excel has a variety of layout features that may enable in-graph
editing.
• Some of the f...
Processing Graph Visualizations
Outside of Excel 2016
171
172
side-by-side data visualizations
from different software tools
Several Main Ways to Export Excel Charts
Copy and Paste as a Linked Graph
• Can export data visualizations as a
copy and p...
174
Several Main Ways to Export Excel Charts (cont.)
Copy and Paste as an Image into a
Digital Image Editing Software Program
...
Microsoft Visio
• For example, MS Visio offers the following: pre-made templates,
forms, containers, call-outs, connectors...
177
Built-in Templates and Online Templates for
Excel (for Defined Applications)
178
Add-ins to Excel 2016
179
180
Year Variable 1 Variable 2 Variable 3 Variable 4 Variable 5
2010 100 8 100 30 180
2011 4 1 7 4 0
2012 0 8 5 200 -180
2...
181
182
183
What are Add-ins?
• Add-ins are software programs built to function with Excel to add
various types of functionalities: da...
Where Can One Find Add-ins for Excel?
• Some of the Excel add-ins are from Microsoft Research and may be
activated within ...
Where Can One Find Add-ins for Excel? (cont.)
• There are different directions for accessing different types of add-ins.
•...
Activating Add-ins
• In Excel, click the File tab.
• Click Options. The Excel Options window opens.
• Click “Add-ins” in t...
Activating Add-ins(cont.)
• Select an add-in of interest, and click “Go” at the bottom.
• An “Add-in” window will open all...
Excel Options -> Add-ins Window
189
The “Add-ins” Window
190
A Note about Data
191
About Data
• Data…
• Has to be collected somewhere advertently or inadvertently
• Has to be practically applied in some wa...
About Data (cont.)
• Dataset metadata may be captured in data dictionaries if the dataset
is a larger sized one
• The fact...
About Data (cont.)
• Having access to a data table or a dataset can give the deceptive
sense of understanding
• Data has t...
Data Visualization Standards
195
Some Common Standards for Data
Visualizations
• Data accuracy (underlying data;
proper contextualization; source
citations...
Some Common Standards for Data
Visualizations (cont.)
• Human and machine readability
of data tables
• Contextualizing
197
Contact and Conclusion
• Dr. Shalin Hai-Jew
• iTAC
• Kansas State University
• 785-532-5262
• shalin@k-state.edu
• Note:
•...
Upcoming SlideShare
Loading in …5
×

Creating Effective Data Visualizations in Excel 2016: Some Basics

818 views

Published on

One of the mainstays of a modern software toolkit is Excel 2016, from Microsoft Office 2016. By reputation, Excel is considered a beginner’s tool that self-respecting data analysts would bypass, but Excel is fairly high-powered, can take up to 1.06 million rows of data per set, contains complex statistical analysis capabilities (without the need for scripting), and enables rich data visualizations. It has a number of rich add-ons to empower different analytical and data visualization functionalities. It works as a great bridging tool to more complex types of statistical analyses.
This session walks participants through some basic built-in data visualizations in Excel 2016, including pie charts and doughnuts, bar charts, tree maps and sunburst diagrams, cluster diagrams, spider (radar) charts, scattergraphs, and others. This session will cover how data structures and desired emphases will determine the options for particular data visualizations.
In this session, participants will
review how to load a data table,
read the general data in a data table (or worksheet),
process or clean the data as needed,
use the Recommended Charts feature,
decide which built-in data visualizations to use, and
consider how to add relevant data visualization elements (including data labels, background grids, axis labels, and titles) for a coherent and effective data visualization.
Also, participants will help co-build data visualizations from open-source and other datasets.

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Creating Effective Data Visualizations in Excel 2016: Some Basics

  1. 1. Creating Effective Data Visualizations in Excel 2016: Some Basics SIDLIT 2017 Aug. 3 – 4, 2017
  2. 2. Presentation Overview • One of the mainstays of a modern software toolkit is Excel 2016, from Microsoft Office 2016. By reputation, Excel is considered a beginner’s tool that self-respecting data analysts would bypass, but Excel is fairly high- powered, can take up to 1.06 million rows of data per set, contains complex statistical analysis capabilities (without the need for scripting), and enables rich data visualizations. It has a number of rich add-ons to empower different analytical and data visualization functionalities. It works as a great bridging tool to more complex types of statistical analyses. • This session walks participants through some basic built-in data visualizations in Excel 2016, including pie charts and doughnuts, bar charts, tree maps and sunburst diagrams, cluster diagrams, spider (radar) charts, scattergraphs, and others. This session will cover how data structures and desired emphases will determine the options for particular data visualizations. 2
  3. 3. Presentation Overview(cont.) • In this session, participants will • review how to load a data table, • read the general data in a data table (or worksheet), • process or clean the data as needed, • use the Recommended Charts feature, • decide which built-in data visualizations to use, and • consider how to add relevant data visualization elements (including data labels, background grids, axis labels, and titles) for a coherent and effective data visualization. • Also, participants will help co-build data visualizations from open- source and other datasets. 3
  4. 4. Presentation Order • Sourcing Datasets • Reading General Data in a Data Table / Worksheet • Processing or Cleaning Data • Using the Recommended Charts Feature in Excel 2016 • Selecting Data Visualization Types • Column, line, pie, bar, area, X Y (scatter), stock, surface, radar, treemap, sunburst, histogram, box & whisker, waterfall, & combo • Going “Off-Script” within Excel • Some Common Mistakes 4
  5. 5. Presentation Order (cont.) • Adding Relevant Data Visualization Elements • Processing Graph Visualizations Outside of Excel 2016 • Add-ins to Excel 2016 • Streamgraphs, #hashtag networks on microblogging sites (on Twitter), related tags networks (on Flickr), article-article networks on MediaWiki (on Wikipedia), and others • Data Visualization Standards • A Note about Data 5
  6. 6. Sourcing Datasets 6
  7. 7. Sourcing Datasets • Downloading public datasets from sites like data.gov • Capturing the back data about how the publically-released data sets were released • Extracting data from online data portals (research sites, survey sites, learning management systems, social media platforms, and others) and converting those files into something readable in databases and Excel 7
  8. 8. Sourcing Datasets (cont.) • Downloading data from social media platforms • These may include Facebook poststreams, Twitter tweetstreams, scraped images from the Web and any image sharing sites, scraped videos from the Web and video-sharing sites, articles from Wikipedia, #hashtag networks from Twitter, keyword networks from Twitter, related tags networks from Flickr, email networks from email systems, and others • These datasets include both structured and semi-structured data 8
  9. 9. Sourcing Datasets (cont.) • Autogenerating data… • From online research suites (often used to test surveys) • From graph visualization tools (to see what randomized graphs look like) • Creating datasets in other software programs and saving out in a file format readable by Excel • Creating data manually in an Excel work sheet • Capturing data in Excel using third-party data downloaders, and others 9
  10. 10. Data Analytics Suites • Some datasets may be exported from data analytics suites. • SPSS, RapidMiner Studio, R, Python, and other tools may be used for high-level statistical analysis and machine learning. However, the data visualization tools may be more focused on conveying data than in presentation-quality data visualizations. • The underlying data may be exported in a form that Excel can use…in order to create the data visualizations. (Excel has a lot of analytics capabilities built-in, too, but complex analytics likely require processing in other software programs.) 10
  11. 11. Data Capture and Pre-Processing • Prior to importing data into Excel, it is likely that the data is pre-processed / cleaned for accuracy. • All data are changed with every touch of technology: • Software may be used to extract or capture data (such as from social media platforms). There are limits to APIs, which are virtually all limited by rate and by amount of data capturable for free. • Software may be used to convert manual coding into digital coding (transcoding). • Software may be used to turn unstructured and semi-structured data into quantitative-based data tables (such as text analytics applications). The reverse is common, too: taking quantitative data and turning it semi-structured (as visuals). • Software may be used to create synthetic or faux data that meets particular requirements (such as a random network graph). 11
  12. 12. Data Everywhere and Fungible • In other words, it is possible to datafy a lot of things. • There is data everywhere… • It is possible to turn most data into information and something somewhat useful. 12
  13. 13. Reading General Data in a Data Table / Worksheet 13
  14. 14. Structured Data Structured data is labeled by row and column headers Such data is categorize-able by type and common characteristics and functions Rows tend to be data records (with unique identifiers in Column A) Columns tend to be variables and attributes Data types include the following: General, Number, Currency, Accounting, Data, Time, Percentage, Fraction, Scientific, Text, Special, and Custom Each cell is labeled by data types, and these types affect how the software handles the data 14
  15. 15. “Unstructured” or “Semi-Structured” Data Text sets, bags of words Image files Audio files Video files Multimedia, and others • Tend to be multi-dimensional and / or high- dimensional data • Tend to be somewhat inherently structured based on the data type (language has some inherent structure; imagery may be defined within 2D or 3D space, etc.), thus the preference for “semi-structured” for word purists • Tend to be various file types, with different file extensions 15
  16. 16. Basic “Structured” Data Structures • Column A tends to contain unique identifiers for the row data • Row 1 tends to contain all the column headers • Column headers tend to be written in CamelCase format • Each row of the row data except the first row contains an individual record • Each column contains a variable each • Each column tends to contain data of a certain type, such as string / text, numerical, percentage, date, and others • Some of the data is human readable, and some is not (based on size of data #### or type) 16
  17. 17. Coding “Structured” Data • Structured data generally has a long history of conventional statistical approaches to analysis, to identify patterns in the data. • There are simple counts. • There are measures of central tendency for parametric datasets. • There are tools for observing and measuring associations. • There are tools for observing and measuring causation-based associations. • There are tools to compare observed data vs. expected data, and measures of statistical significance. • There are tools to support experimental setups, to compare control groups with experimental groups. • There are tools to measure confidence in statistical findings. 17
  18. 18. Basic “Unstructured” and “Semi-Structured” Data Structures • Language data tends to have an inherent structure based on how evolved languages originate and change over time. • Image data tends to have an inherent structure based on image features: image sizes, orientation, main subject matter, colors, resolution, and other factors. • Audio data tends to have an inherent structure by voiceprint (and / or waveform), occurrences in time, sound frequencies, and other factors. • Video data tends to have an inherent structure by frames-per-minute imagery, waveforms, and other factors. 18
  19. 19. Coding “Unstructured” and “Semi-Structured” Data • So-called “unstructured” or “semi-structured” data are coded in a variety of different ways. • One approach is with a priori coding, or using an extant model, conceptual framework, or other structure to create a codebook against which the data are coded. • Another general approach is with “emergent” coding, which starts with the raw data and results in an evolved codebook. • Then, there are many combinations of the two above approaches. 19
  20. 20. Coding “Unstructured” and “Semi-Structured” Data (cont.) • Such unstructured / semi-structured data are multi-dimensional, so they can be analyzed in a variety of different ways and are somewhat robust against having a certain interpretation stick and predominate over others. • Data are generally polysemous or multi-meaninged. • There are public text corpora that have been created for broad-scale use in the testing of software tools, programs, algorithms, and processes for text analysis, in order to be able to have comparable and competitive analyses. 20
  21. 21. Coding “Unstructured” and “Semi-Structured” Data (cont.) • There are some text corpora which are non-consumptive, which means only the top-level statistics and other metrics about a text set are available, but the underlying texts (the actual data) themselves are not. “Shadow” datasets are made accessible for the queries, but to avoid the risk of re-identification of original copyrighted manuscripts, the original manuscripts in their original order are not made available. (Google Books Ngram Viewer is a well known example.) 21
  22. 22. Coding “Unstructured” and “Semi-Structured” Data (cont.) • Such data may be coded by humans alone, computer alone, or a cyborg-ian mix • Advances in computer vision (object identification, sentiment analysis of images, predictivity of “what happens next” in a video sequence) and other capabilities have extended computer capabilities at coding such data 22
  23. 23. 23
  24. 24. 24
  25. 25. 25
  26. 26. Unlinked or Linked Data Tables Flat Files • Data tables treated as single stand-alone files that may be assessed alone or queried in relation to other files Linked Files • Data tables treated as interconnected and related files that may be queried across data tables and fields 26
  27. 27. Processing or Cleaning Data 27
  28. 28. Some Common Questions for Data Processing or Cleaning Structured Data • How should missing data be handled? (Should empty cells mean deleting the whole record? Should empty cells be filled with N/A? Should empty cells be filled with randomly-generated contents based on the other data in the set? Should empty cells be zeroed out?) • How should repeated data be handled? Unstructured or Semi-Structured Data • How should scraped imagery that consists of a corrupted file be handled? Should these be omitted? Should these be kept and partially coded? • In an image set, how should different versions of an image be coded? Should that be counted multiple times? What if the image is re-inscribed and reused by others in new ways? 28
  29. 29. Some Common Questions for Data Processing or Cleaning (cont.) Structured Data • In a set of parametric data, how should extreme outliers be handled? Should they be omitted, so has not to skew a curve? Should they be treated differently than the other data in the set? Unstructured or Semi-Structured Data • In a multi-lingual text set, how should all the other languages besides the non-base language be used? Should these language inputs be manually handled? Should these non-base language inputs be translated to the base language for machine analysis? 29
  30. 30. Some Common Questions for Data Processing or Cleaning (cont.) Structured Data • There may be benefits to combining multiple open-source datasets, which each have insights to contribute to the study of a particular issue. The variables are not exactly mappable to each other though. How should such datasets be melded? How should the mixed dataset be described? How should the original datasets be credited? Unstructured or Semi-Structured Data • In a mixed multi-modal dataset of various multimedia contents, there is a lot of room for interpretation and subjectivity. What tools should be designed to aid in creating consistency in the coding and interpretations? 30
  31. 31. Some Common Questions for Data Processing or Cleaning (cont.) Structured Data • The original labeling of online data from an online research suite is too verbose and complex. Renaming the variables is important to enable easier data processing and easier setup of data tables. What is a legitimate process of renaming variables for accuracy and efficiency? Unstructured or Semi-Structured Data • Machine coding enables faster processing of various types of unstructured and semi- structured data. However, the machine coding also introduces some degree of ambiguity and “noise”? How should the use of computers to code be balanced against human-based insights? 31
  32. 32. Some Common Questions for Data Processing or Cleaning (cont.) Structured Data • In combining manual coding for a team coded project, there are some new codes that were not part of the original codebook. Should these new nodes be included in the similarity analysis computation for a Cohen’s Kappa / Kappa Coefficient? Unstructured or Semi-Structured Data • In a particular study, there is a set of videos that has been hacked and taken from a company. The videos are relevant to the research and would offer value, but they are not legally available. Should these videos be used, or should they be expunged from the study set? 32
  33. 33. Some Common Questions for Data Processing or Cleaning (cont.) Structured Data • An online survey system has accidentally captured respondent identifier information during the normal course of the data capture. The demographic data may be used for deeper analytics. Should this data be used? How so? Why or why not? Unstructured or Semi-Structured Data • Scraped online data come from a variety of sources, and the source citations may be hard to find. There are some tools that enable reverse image searches, but other search tools are more painstaking to use, particularly for video searches. How much effort should be put into having proper and correct citations for the original sources? 33
  34. 34. Some Common Questions for Data Processing or Cleaning (cont.) Structured Data • In many fields, original datasets have to be published out and shared at the time of publication. In the process of releasing data, a researcher has to go through a process of de-identification…and has to work hard to ensure that the data may not be re-identified. How much due diligence should a researcher go through to protect the participants of his / her / their study? Unstructured or Semi-Structured Data • There are auto- machine-created transcripts available for videos hosted on a social video sharing sites. The transcripts are sometimes improved on with human coding, in some cases, but in many cases, the transcripts are not directly fixed and so include various mistakes. Should these transcripts be corrected first before they are coded for research? Or should they go in, mistakes and all, even if this means that some garble is included? 34
  35. 35. Some Common Questions for Data Processing or Cleaning (cont.) Structured Data • In some cases, conceptual data may be applied to communicate theories, models, and frameworks. Also, there may be synthetic or faux data. How should one communicate the fact that this data is conceptual? Unstructured or Semi-Structured Data • In an image set, there will images of various types: photos, screen grabs of virtual worlds, screen stills of videos, diagrams, drawings, scans of documents, and other types of visuals. How should the various types of modalities be addressed? 35
  36. 36. General Points about Data Processing / Cleaning • There should be clear principles and rationales for how data is handled. These should be clearly documented. • Generally, data processing should not be lossy (lose information). • Data processing should be selective but non-destructive. • Data processing should not result in undue skew or bias to the original data. • Data processing should not result in data leakage, confidentiality compromises, re-identification of research participants, or any compromise of data privacy . 36
  37. 37. General Points about Data Processing / Cleaning(cont.) • There should be clear steps and processes applied to data processing, and these should be documented. If there are deviations to this processing, that should be recorded as well. • A raw set of all data should be preserved in its initial pristine state before any data processing or data cleaning is done. This is to ensure that there is a pristine master set against which to re-extract a new set for other processing…and also a master against which to compare cleaned datasets. • If this data processing is done by machine, the “macros” should be documented. 37
  38. 38. Using the “Recommended Charts” Feature in Excel 2016 38
  39. 39. Accessing the “Recommended Charts” Feature • To access this feature, highlight the desired data to map (from the dataset), and click the “Insert” tab, and select “Recommended Charts.” 39
  40. 40. About the “Recommended Charts” Feature • This “recommended charts” feature offers some cognitive scaffolding to new data visualizers by suggesting possible data visualizations. • This feature seems to offer the most simple options first, even after a user may have gone with more complex data visualizations for similarly structured data. 40
  41. 41. Selecting the Right Amount of Data to Map • Every selected cell of data—even the empty ones—contain meaning in the data visualization, and they will be represented there. • The selected data should be structurally coherent. • In other words, the positioning of the respective cells should convey to the software how the data visualization should be drawn. Part of data preparation involves the positioning of the data in a correct structure. • It is possible to transpose elements on an axis and make other changes once the image is drawn, but it’s preferable to have the data structured correctly. • The selected variables in a dataset should be interconnected. If the data is not interconnected somehow—by meaning or by potential association—then it would be harder to justify having the same information in the same visualization. • Too much data will mean that Excel cannot draw the graph. Too little information will mean that the data visualization is not clear. • Data labels are usually handled, in part, in the data table itself. Those should be correctly set up, with proper spelling, proper capitalization, proper CamelCase (if used), and parallel construction. 41
  42. 42. Accessing the “All Charts” Feature • There is one tab that offers some “Recommended Charts”. The tab next to it offers “All Charts.” • Both are interactive and selectable. 42
  43. 43. The “Recommended Charts” Feature • This Excel feature assesses the types of data in the dataset or worksheet and proposes a few data visualizations that may best represent that data. • Sometimes, one needs to restart the software to get this to work. • Some other software tools (like IBM Watson) will actually preliminarily analyze the data and suggest aspects of the data to focus on for human analysts. 43
  44. 44. The “Recommended Charts” Feature (cont.) • If too much data have been highlighted, then a message will be shown. It will read in part “Recommendations are not available for the data you selected. To choose a chart type, click All Charts.” • Some reasons why a chart may not be identifiable include the following: • no numbers that are summarizable • data from multiple worksheets • numerous number of data cells • contains defined names (“range names”), columns defined as variables with particular characteristics • such as combinations of various columns linked by mathematical functions for a new variable 44
  45. 45. The “Recommended Charts” Feature (cont.) • This feature is a generalized one and does not include deep or unique or insider insights about the underlying data. • This means that the suggestions made may not be optimal for the dataset or the context of the researcher. • Researcher objectives will also affect the selection of the optimal data visualizations (and data visualization sequences). To access this feature, highlight the desired data to map (from the dataset), and click the “Insert” tab, and select “Recommended Charts.” 45
  46. 46. 46
  47. 47. 47
  48. 48. 48
  49. 49. 49 0 10 20 30 40 50 60 70 80 90 100 Analytic Clout Authentic Tone PercentileRankings Four Text Feature Scores Four Text Analytic Scores from a "Microsoft Excel" Article in Wikipedia
  50. 50. 50
  51. 51. Abstracting Core Descriptive Functions in Data Visualizations • Proportionality (“intensity”) • Frequency counts • Pie charts, bar charts, intensity matrices, area charts, radar diagrams, histograms • Changes over time • Frequency counts over time • Line graphs, scattergraphs • Hierarchical relationships • Word networks, frequency word counts, topic modeling in word sets (text corpora) • Word network graphs, dendrograms, sunburst diagrams • Descriptive statistics • Distribution; central tendency (mean, median, mode), dispersion (standard deviation, min-max range, variance) • Bar charts, curves 51
  52. 52. Abstracting Core Descriptive Functions in Data Visualizations (cont.) • Social relationships • Intercommunications, follower- followee relationship • Social networks • Physical-spatial relationships • Events occurring in space, locations • Geographical maps 52
  53. 53. Abstracting Core Analytical Functions in Data Visualizations: Deductive, Inductive, Inferential • Data relationships • Associations, causations • Scattergraphs, line graphs, line plots • Text analysis • Word frequency counts, text queries, topic modeling (unsupervised theme extraction), sentiment analysis • Cluster diagram, word cloud, word tree, dendrogram, bar chart, intensity matrix 53
  54. 54. Filtering Data • The “Sort & Filter” option enables users to select a column or segment of a column to alphabetize or sort numerically (from most to least, from least to most) or sort by date (from most-recent to least- recent, or least-recent to most-recent), and so on. 54
  55. 55. Filtering Data (cont.) • A “Sort Warning” window asks whether the user wants to “Expand the selection” or just “Continue with the current selection” • Generally, the selection should be expanded. This means that the entire row of data will move with however the selection moves. The data will still be correct and of-a-piece. • If not, only the selection will be sorted, and all the other row data will be left in their prior positions. If there is a very limited issue that is being addressed and the whole dataset is pristine and accurate elsewhere, then just continuing with the current selection may be the right choice. 55
  56. 56. Selecting Data Visualization Types 56
  57. 57. What Follows • In the following section are the main types of data graphs enabled in Excel with its built-in charting features. • Each section begins with the type of chart and some general characteristics, followed by examples. • A majority of the examples are drawn with open-source real-world data. One data visualization was created using synthetic data for effects, and that visualization has been labeled as being created using faux data. • The data may have been processed using other tools, but the graphs themselves were all created in Excel. • On the same slide as the graph or directly after each graph is a table with the underlying data, to help viewers understand the connection between the data and the visualization. 57
  58. 58. Column Charts • Column graphs may tend to be vertical (vs. horizontal). • In other words, they tend to align with the placement of a column. • Column graphs may be 2D or 3D. • The common shapes representing data are rectangles. • Columns may be stacked. • Related columns may be clustered. • Columns may be summed to 100% in “100% stacked column (or bar) charts.” 58
  59. 59. 59
  60. 60. Data Structure for the Vertical Column or Bar Chart on the Prior Slide pronou n ppron i we you shehe they ipron article prep auxver b adverb conj negate 2.47 0.36 0.07 0.08 0.07 0.00 0.15 2.10 5.77 10.78 3.74 1.69 4.07 0.52 60
  61. 61. 61
  62. 62. Data Structure for the Stacked Bar Chart in the Prior Slide Very Negative Moderately Negative Moderately Positive Very Positive SpaceX Public Group FB 210 305 389 245 Tesla Motors Club FB 12 20 45 23 62
  63. 63. 63 Group Selfies Dronies Babies 7 3 Children 63 21 Teens 20 9 20s and 30s 943 168 40s 49 10 50s 25 7 60s 12 2 70s and older 1 4 Mixed Age 224 116 Unknowable 27 185
  64. 64. Line Charts • Line graphs tend to be horizontal • Line graphs may represent changes over time • In such cases, time is represented on the x-axis, and some variable with a numerical measure (counts, percentages, frequencies, intensities) is represented in the y-axis • Time units should be consistent • Line graphs with time on the x-axis may be enhanced with a drawn “trendline” to indicate directionality of the phenomena over the studied / observed time frame and into the near future. • Comparative line graphs may show multiple related factors (variables) interacting over time with each other 64
  65. 65. Line Charts (cont.) • Line graphs may have two different variables with one represented on the x-axis and one on the y • The line itself then may show some association between the two variables (which should be continuous variables) • The associations may be negative or positive • The associations may be more complex and curvilinear (not staying consistent one way or another over time) • Or there may be no apparent association • Where a bar graph (the prior one) suggests a discretization (and “space”) between the bar elements, a line graph suggests more nuance and some continuity in the variables (and less space or no space between variables, expressed as a dotted line or a continuous line). 65
  66. 66. 66 Month Count Sept. 2015 83 Oct. 2015 222 Nov. 2015 51 Dec. 2015 15 Jan. 2016 4 Feb. 2016 2
  67. 67. 67 HLY-TEMP-NORMAL HLY-DEWP-NORMAL 34.9 28.9 34.4 28.7 33.9 28.4 33.4 28.3 33.1 28 32.7 27.9 32.5 27.7 32.3 27.6 32.1 27.4 33.8 28.2 36.4 28.9 39.4 29.3 42.1 29.2 44.2 29.1 45.6 28.9 46.2 28.5 45.8 28.5 44.1 28.6 41.2 28.5 39.6 28.8 38.2 29 37.2 29 36.3 29 35.5 29 34.9 28.8 34.4 28.7 33.9 28.4 33.5 28.3 33.1 28 32.7 27.9 32.5 27.7 32.3 27.6 32.1 27.5 33.8 28.3 36.6 29.1 39.6 29.5
  68. 68. 68 Very Negative Moderately Negative Moderately Positive Very Positive SpaceX Public Group FB 210 305 399 245 Tesla Motors Club FB 12 20 45 23
  69. 69. Pie Charts • Pie charts are used to represent (related) proportions of a whole. Proportions are determined numerically—by raw counts or percentages, usually. • The respective proportions are represented as “slices.” • Pie charts may be 2D or 3D. • One version of a “pie chart” is a doughnut, which is a circular proportional representation. • “Exploding” pie charts have sections that are pulled out from the main pie as a point-of-emphasis. 69
  70. 70. 70 creation_pending 65 deleted 2112 pre_registered 13795 registered 84351 100323
  71. 71. 71 DELETE (Remove) 163 GET (Read) 824038 HEAD (Retrieve Resource) 204 PATCH (Update, Modify) 12 POST (Create) 218950 PUT (Create) 5209 1048576
  72. 72. 72 Course 32936 Group 14576 47512
  73. 73. Bar Charts • Bar charts use rectangular shapes (and the sizes of these shapes) to indicate quantities and intensities. • Bar charts may have bars be either horizontal or vertical. • Bar charts may be 2D or 3D. • The bars may be stacked; they may be clustered. • Bars may be summed to 100% in “100% stacked column (or bar) charts.” • Note: The bar chart types are as follows: vertical stacked bar chart, 100% stacked horizontal bar chart, and a Pareto chart. 73
  74. 74. 74 auto_graded 239430 human_graded 1070322 not_graded 787398 2097150
  75. 75. 75
  76. 76. Data Structure for the 100% Bar Chart on the Prior Slide Very Negative Moderately Negative Moderately Positive Very Positive SpaceX Public Group FB 210 305 399 245 Tesla Motors Club FB 12 20 45 23 76
  77. 77. 77
  78. 78. Data Structure and Source for Stacked Bar Charts in Prior Slide • Data from “Comparative Analysis of 4-H Enrollment and U.S. Census School Data” • conditional data distribution • REEIS Report • July 2010 • 4H38-Comparitive(sic)-Analysis-of-4H-Enrolment(sic)-US-census- school-grade-data.xlsx 78 Region Name:<All> State Name:<All> Kindergarten 1st Grade 2nd Grade 3rd Grade 4th Grade 5th Grade 6th Grade 7th Grade 8th Grade 9th Grade 10th Grade 11th Grade 12th Grade 4-H Enrollment 3,091,210 3,090,230 3,733,658 5,327,058 6,851,803 6,383,579 4,522,690 2,955,880 2,465,910 1,625,339 1,444,406 1,258,010 989,248 US Census 27,624,237 27,945,407 28,086,468 27,494,293 28,503,328 28,461,965 28,102,045 27,757,994 27,640,711 27,576,100 27,584,713 27,644,003 27,744,150
  79. 79. 79 Main Time Zones Alaska 3 Arizona 7 Asuncion 1 Auckland 1 Baghdad 1 Bangkok 1 Beijing 1 Bogota 1 Brasilia 2 Bucharest 1 Central America 1 Central Time (US & Canada) 34765 Copenhagen 1 Eastern Time (US & Canada) 130 Harare 1 Hawaii 3 Indiana (East) 2 Islamabad 1 Jerusalem 1 La Paz 1 London 1 Mid-Atlantic 1 Moscow 1 Mountain Time (US & Canada) 59 Nairobi 1 Pacific Time (US & Canada) 46 Paris 1 Rome 3 Seoul 2 Tokyo 1 35041
  80. 80. Area Charts • Area charts are built from line charts. • In these, the areas under the respective lines are filled in with certain colors and / or textures to indicate quantitative data (frequencies, amounts, intensities, etc.). • The data may be comprised of one record or multiple comparable records. • The areas are usually somewhat transparent to enable visualizing other related data records to enable comparisons of quantities. 80
  81. 81. 81 Null 769,964 1 246,371 2 20,907 3 6749 4 1762 5 1,239 6 368 7 212 8 309 9 138 10 104 11 39 12 20 13 32 14 24 15 23 16 27 17 14 18 8 19 10 20 13 21 8 22 153 23 7 24 7 25 6 26 8 27 6 28 1 29 6 30 7 31 4 32 2 33 1 34 4 35 4 36 1 38 2 39 4 40 3 41 1 43 1 45 2 58 1 59 1 62 2 108 1
  82. 82. 82 Group Selfies Dronies Babies 7 3 Children 63 21 Teens 20 9 20s and 30s 943 168 40s 49 10 50s 25 7 60s 12 2 70s and older 1 4 Mixed Age 224 116 Unknowable 27 185
  83. 83. 83 Year (All) Data Region State Detail of Male Detail of Female Detail of Youth Detail of 4H Units CENTRAL Illinois 204,152 223,692 427,844 10,356 Indiana 236,705 270,895 507,600 53,547 Iowa 118,213 130,830 249,043 18,215 Kansas 79,472 82,265 161,737 17,218 Michigan 431,160 494,569 925,729 42,882 Minnesota 689,203 709,449 1,398,652 51,972 Missouri 82,138 90,864 173,002 13,016 Nebraska 210,271 254,071 464,342 22,238 North Dakota 88,858 109,733 198,591 9,555 Ohio 210,287 229,454 439,741 26,461 South Dakota 36,779 42,570 79,349 3,502 Wisconsin 347,307 435,679 782,986 41,115 EASTERN Connecticut 13,810 18,777 32,587 806 Delaware 28,683 34,198 62,881 2,027 District Of Columbia 4,746 7,360 12,106 14 Maine 44,324 55,631 99,955 2,290 Maryland 51,653 57,881 109,534 2,120 Massachusetts 66,039 77,434 143,473 7,277 New Hampshire 23,413 26,505 49,918 1,596 New Jersey 56,877 68,528 125,405 5,124 New York 485,408 538,688 1,024,096 55,771 Pennsylvania 60,696 62,811 123,507 3,909 Rhode Island 8,131 8,449 16,580 622 Vermont 9,544 13,366 22,910 1,070 West Virginia 81,066 102,271 183,337 11,883 SOUTHERN Alabama 55,709 63,794 119,503 3,457 American Samoa 1,470 1,563 3,033 33 Arkansas 185,849 214,993 400,842 16,074 Florida 145,672 165,895 311,567 10,723 Georgia 138,478 169,541 308,019 27,291 Kentucky 393,835 440,087 833,922 41,215 Louisiana 60,566 89,470 150,036 1,619 Mississippi 157,890 168,154 326,044 13,872 North Carolina 338,669 410,066 748,735 26,291 Oklahoma 162,894 189,696 352,590 13,111 Puerto Rico 55,376 63,447 118,823 4,194 South Carolina 94,726 113,897 208,623 6,772 Tennessee 48,558 55,487 104,045 6,872 Texas 1,676,187 1,737,070 3,413,257 15,562 VirginIslands 1,270 1,673 2,943 116 Virginia 126,927 134,656 261,583 11,760 WESTERN Alaska 19,513 21,081 40,594 425 Arizona 99,523 105,077 204,600 5,173 California 90,022 94,206 184,228 2,548 Colorado 95,659 115,351 211,010 7,956 Guam 16,447 17,307 33,754 215 Hawaii 16,996 18,050 35,046 838 Idaho 13,579 16,945 30,524 2,581 Micronesia 3,992 4,484 8,476 81 Montana 22,332 29,073 51,405 8,766 Nevada 49,466 59,650 109,116 1,809 New Mexico 110,287 114,815 225,102 2,728 Northern Mariana Islands 2,766 3,093 5,859 107 Oregon 20,999 25,360 46,359 3,324 Utah 86,325 113,686 200,011 4,156 Washington 70,648 79,181 149,829 6,649 Wyoming 29,581 36,899 66,480 4,038 Grand Total 8,061,146 9,019,717 17,080,863 654,942
  84. 84. 84 Group Selfies Dronies Babies 7 3 Children 63 21 Teens 20 9 20s and 30s 943 168 40s 49 10 50s 25 7 60s 12 2 70s and older 1 4 Mixed Age 224 116 Unknowable 27 185
  85. 85. 85 Date selfie selfie guy 1/1/2004 -0.583 -0.608 2/1/2004 -0.583 -0.608 3/1/2004 -0.583 -0.608 4/1/2004 -0.583 -0.608 5/1/2004 -0.583 -0.608 6/1/2004 -0.583 -0.608 7/1/2004 -0.583 -0.608 8/1/2004 -0.583 -0.608 9/1/2004 -0.583 -0.608 ######## -0.583 -0.608 ######## -0.583 -0.608 ######## -0.583 -0.608 1/1/2005 -0.583 -0.608 2/1/2005 -0.583 -0.608 3/1/2005 -0.583 -0.608 4/1/2005 -0.583 -0.608 5/1/2005 -0.583 -0.608 6/1/2005 -0.583 -0.608 7/1/2005 -0.583 -0.608 8/1/2005 -0.583 -0.608 9/1/2005 -0.583 -0.608 ######## -0.583 -0.608 ######## -0.583 -0.608 ######## -0.583 -0.608 1/1/2006 -0.583 -0.608 2/1/2006 -0.583 -0.608 3/1/2006 -0.583 -0.608 4/1/2006 -0.583 -0.608 5/1/2006 -0.583 -0.608 6/1/2006 -0.583 -0.608 7/1/2006 -0.583 -0.608 8/1/2006 -0.583 -0.608 9/1/2006 -0.583 -0.608 ######## -0.583 -0.608 ######## -0.583 -0.608 ######## -0.583 -0.608 1/1/2007 -0.583 -0.608 2/1/2007 -0.583 -0.608 3/1/2007 -0.583 -0.608 4/1/2007 -0.583 -0.608 5/1/2007 -0.583 -0.608 6/1/2007 -0.583 -0.608 7/1/2007 -0.583 -0.608 8/1/2007 -0.583 -0.608 9/1/2007 -0.583 -0.608 ######## -0.583 -0.608 ######## -0.583 -0.608 ######## -0.583 -0.608 1/1/2008 -0.583 -0.608 2/1/2008 -0.583 -0.608 3/1/2008 -0.583 -0.608 4/1/2008 -0.583 -0.608 5/1/2008 -0.583 -0.608 6/1/2008 -0.583 -0.608 7/1/2008 -0.583 -0.608 8/1/2008 -0.583 -0.608 9/1/2008 -0.583 -0.608 ######## -0.583 -0.608 ######## -0.583 -0.608 ######## -0.583 -0.608 1/1/2009 -0.583 -0.608 2/1/2009 -0.583 -0.608 3/1/2009 -0.583 -0.608 4/1/2009 -0.583 -0.608 5/1/2009 -0.583 -0.608 6/1/2009 -0.583 -0.608 7/1/2009 -0.583 -0.608 8/1/2009 -0.583 -0.608 9/1/2009 -0.583 -0.608 ######## -0.583 -0.608 ######## -0.583 -0.608 ######## -0.583 -0.608 1/1/2010 -0.583 -0.608 2/1/2010 -0.583 -0.608 3/1/2010 -0.582 -0.608 4/1/2010 -0.583 -0.608 5/1/2010 -0.583 -0.608 6/1/2010 -0.582 -0.608 7/1/2010 -0.582 -0.608 8/1/2010 -0.582 -0.608 9/1/2010 -0.583 -0.608 ######## -0.582 -0.608 ######## -0.582 -0.608 ######## -0.582 -0.608 1/1/2011 -0.582 -0.608 2/1/2011 -0.582 -0.608 3/1/2011 -0.582 -0.608 4/1/2011 -0.582 -0.608 5/1/2011 -0.582 -0.608 6/1/2011 -0.581 -0.608 7/1/2011 -0.581 -0.608 8/1/2011 -0.581 -0.608 9/1/2011 -0.581 -0.608 ######## -0.581 -0.608 ######## -0.581 -0.608 ######## -0.581 -0.608 1/1/2012 -0.579 -0.608 2/1/2012 -0.577 -0.608 3/1/2012 -0.579 -0.608 4/1/2012 -0.55 -0.608 5/1/2012 -0.576 -0.608 6/1/2012 -0.575 -0.608 7/1/2012 -0.573 -0.608 8/1/2012 -0.57 -0.608 9/1/2012 -0.566 -0.608 ######## -0.553 -0.608 ######## -0.546 -0.608 ######## -0.521 -0.608 1/1/2013 -0.512 -0.608 2/1/2013 -0.487 -0.608 3/1/2013 -0.463 -0.387 4/1/2013 -0.428 -0.376 5/1/2013 -0.383 -0.314 6/1/2013 -0.314 -0.069 7/1/2013 -0.142 0.033 8/1/2013 -0.135 0.14 9/1/2013 -0.096 0.359 ######## 0.178 0.8 ######## 0.326 0.786 ######## 1.33 1.282 1/1/2014 0.797 1.205 2/1/2014 1.043 1.413 3/1/2014 3.312 2.333 4/1/2014 2.671 3.007 5/1/2014 1.814 2.222 6/1/2014 1.501 2.251 7/1/2014 1.594 2.005 8/1/2014 1.781 1.983 9/1/2014 1.648 1.799 ######## 1.637 1.863 ######## 1.668 1.594 ######## 2.114 1.958 1/1/2015 2.018 1.857 2/1/2015 1.779 1.449 3/1/2015 1.807 1.55 4/1/2015 2.02 1.69 5/1/2015 2.117 1.847 6/1/2015 2.279 1.924 7/1/2015 2.141 2.166 8/1/2015 1.717 1.851 9/1/2015 1.604 1.461 ######## 1.393 1.412 ######## 1.366 1.54 ######## 1.748 1.829 1/1/2016 1.437 1.912 2/1/2016 1.231 1.466 3/1/2016 3.67 1.657 4/1/2016 1.385 1.445 5/1/2016 1.284 1.751 6/1/2016 1.377 1.578 7/1/2016 1.389 1.375 8/1/2016 1.091 1.612 9/1/2016 0.983 1.241 ######## 1.026 1.162 ######## 1.193 1.04 ######## 1.387 1.193 1/1/2017 1.01 1.113 2/1/2017 0.915 0.836 3/1/2017 0.85 1.02
  86. 86. Data Structure and Source for Area Chart in Prior Slide Comparison of search frequencies for “selfie” and “selfie guy” on Google Search from 2004 – 2017 (June) Selection of two columns of less-correlated less co-varying web search activity data from the related downloadable .csv file Correlations are over time with normalized data (z-scores) over weekly and monthly intervals in the time period Extracted from Google Correlate data 86
  87. 87. X Y (Scatter) Charts • Scatter graphs (aka scatter plots or scatter diagrams) capture two sets of point data. • On the respective x-axis and y-axis, different variables are represented. These variables are often continuous (vs. discrete) ones. • Sometimes, lines are drawn through the data to help in visualizing positive associations (the increase in one results in the increase in the other), negative associations (the increase in one results in the decrease in the other), no relations, or curvilinear relations (more complex associations than linear ones). • Of course, correlations do not mean causation per se. 87
  88. 88. 88 open high low close 182.53 183.4 182.53 183.385 181.75 182.46 181.61 182.06 179.42 180.93 179.42 180.38 178.74 179.82 178.346 179.3 178.47 179.9 178.16 178.4 178.59 179.97 177.12 177.85 175.84 179.08 175.65 179.02 175.74 176.88 175.5639 175.62 178.25 178.25 175.94 176.05 177.5 178.6 176.96 178.57 179 179.97 177.48 177.56 178.39 179.09 177.26 178.85 177.56 178.22 177.12 177.37 179 180.18 176.89 177.08 176.88 178.79 176.76 178.7 177.08 177.73 175.5 176.65 178.02 178.18 176.81 176.86 177.25 178.49 177.22 177.98 177.4 177.99 176.97 177.63 176.29 177.683 175 177.36 174.37 176.44 173.75 176.1 176.85 177.53 174.7687 175.82 177.34 177.85 176.59 177.26 175.96 177.1 175.5 176.98 179.99 180.25 175.5 175.96 180.1 180.15 179.14 179.39 178.31 180.3835 178.17 180.1 179.82 180 177.64 178.19 179 179.24 177.97 178.71 178.54 179.69 177.71 178.73 177.16 179.19 177.07 179.05 181.9 181.97 177.92 178.7 181.43 182.59 179.58 180.57 182.4 182.694 181.49 181.74 180.64 182.84 180.6209 182.02 181.4 182.3 180.43 180.93 183.04 183.58 181.45 182.18 184 185.71 182.97 182.99 181.85 184.8 181.82 183.91 180.34 181.93 179.67 180.23 178 179.8839 177.55 179.43 176.75 178.8 176.1 177.44 175.97 177 175.7 176.86 174.98 175.75 174.01 175.36 173.92 176.17 173.68 175.56 170.41 173.25 170.4 172.71 169.64 170.81 168.75 170.81 168.42 169.95 168.35 169.3 167.7 168.8 167.22 168.5 166.45 169.07 166.35 168.03 165.25 166.45 164.47 166.23 164.67 165.086 164.06 164.28 165 165.24 163.69 163.81 165 167.4199 164.8725 166.5 162.42 164.08 162.38 163.98 162.99 163.56 162.31 162.4 163.22 163.97 160.82 162.26 164.25 165.81 163.12 163.97 164.96 165.1 163.22 163.42 165.92 165.99 163.82 165.57 169.21 169.8 167.01 167.7 167.25 170 167.25 169.12 163.59 168.65 163.24 167.36 158.58 160.93 157.84 160.55
  89. 89. 89 http://www.nasdaq.com/symbol/pg/historical
  90. 90. Data Structure of the Scatter Graph from the Prior Slide 90 date close volume open high low 15:26 90.13 5,975,886 89.55 90.25 89.47 ######## 89.55 8274540 89.12 89.655 89.01 ######## 88.62 9060987 89.17 89.28 88.61 ######## 89.33 6979447 89.7 89.7 89.31 ######## 89.6 6915161 90.1 90.45 89.5 ######## 90.8 7087668 90.4 91.13 90.34 ######## 90.39 6931622 90.23 90.59 90.13 ######## 90.03 5036793 90.05 90.54 89.79 ######## 90.31 6286282 89.7 90.421 89.56 ######## 89.8 5184689 89.66 89.82 89.29 ######## 89.49 5949792 89.13 89.679 88.75 4/7/2017 89.23 4739026 89.46 89.61 89.18 4/6/2017 89.4 7231494 89.77 89.81 89.31 4/5/2017 89.97 6310419 90 90.5 89.76 4/4/2017 89.91 5680493 89.75 89.96 89.42 4/3/2017 89.68 6967439 89.86 90.06 89.45 ######## 89.85 6942342 90.03 90.34 89.84 ######## 90.2 3691924 90.5 90.568 90.1 ######## 90.6 4252780 90.56 90.87 90.39 ######## 90.76 11855760 90.17 91.0599 90.13 ######## 90.49 10066350 90.45 90.715 90.18 ######## 90.57 9251849 90.77 90.915 90.21 ######## 90.77 6824203 90.91 91.46 90.6 ######## 90.99 7791844 91.31 91.8 90.75 ######## 91.19 8212432 91.3 91.75 91.03 ######## 91.22 7772903 90.96 91.41 90.94 ######## 91 37004960 91.45 92 90.92 ######## 91.44 6600312 91.44 91.7199 91.1 ######## 91.4 7855256 91 91.78 90.73 ######## 91 6684468 91.26 91.59 90.82 ######## 91.31 7090852 91.06 91.49 90.85 ######## 91.07 6794349 90.8 91.16 90.65 3/9/2017 90.34 5587367 90.14 90.49 90.095 3/8/2017 90.14 5511775 90.02 90.35 89.76 3/7/2017 90.29 5362951 90.14 90.49 90.06 3/6/2017 90.37 6491060 89.89 90.51 89.59 3/3/2017 90.5 8337132 90.8 90.82 89.8948 3/2/2017 90.91 7027222 91.4 91.585 90.88 3/1/2017 91.66 8787804 91.05 91.89 90.71 ######## 91.07 10535470 90.9 91.79 90.65 ######## 90.89 11840270 90.53 90.91 90.06 ######## 91.05 6639827 90.97 91.34 90.67 ######## 91.13 7611607 91.61 91.8 90.91 ######## 91.44 6749856 91.45 91.8 91.25 ######## 91.67 9018703 90.58 91.8 90.58 ######## 91.09 12088310 90.61 91.36 90.45 ######## 90.79 12404800 90.92 91.12 90.54 ######## 91.12 25842240 89.81 91.15 89.81 ######## 87.86 20073350 88 88.22 87.23 ######## 88.31 6894184 88.04 88.355 87.64 ######## 87.97 11050280 88.56 88.73 87.965 2/9/2017 88.67 9913696 88.25 88.78 88.03 2/8/2017 88.33 6815666 88.07 88.34 87.81 2/7/2017 88.01 6646055 87.64 88.28 87.475 2/6/2017 87.4 7474975 87.5 87.785 87.13 2/3/2017 87.41 7154381 88.12 88.17 87.39 2/2/2017 87.76 8996855 87.61 88.35 87.25 2/1/2017 87.33 8291504 87.03 87.59 86.75 ######## 87.6 9694265 86.63 87.66 86.53 ######## 86.75 7349934 86.78 86.86 86.5 ######## 86.72 9324287 86.45 86.85 86.02 ######## 86.6 6540385 87.12 87.23 86.59 ######## 87.16 8081945 87.84 87.9 87.06 ######## 87.86 8704054 87.22 87.95 87.22
  91. 91. 91
  92. 92. Data Structure of the Scatter Graph from the Prior Slide 92 date close volume open high low 16:00 30.61 6,225,686 30.47 30.73 30.41 ######## 30.35 9132305 31.11 31.13 30.16 ######## 30.7 5615095 31.05 31.1599 30.64 ######## 31.07 7219874 30.59 31.16 30.39 ######## 30.39 9324046 30.77 30.92 30.385 ######## 30.66 5397538 30.68 30.775 30.415 ######## 30.73 4003535 30.59 30.8 30.53 ######## 30.43 7475848 30.75 31.03 30.43 ######## 30.65 8697939 30.89 31.13 30.46 ######## 31.15 5244927 31.13 31.285 30.79 ######## 31.24 7836575 31.08 31.41 31.07 4/7/2017 31.07 4968497 31.14 31.25 30.83 4/6/2017 31.12 9198214 31.27 31.3574 30.65 4/5/2017 31.38 8600055 31.78 31.845 31.26 4/4/2017 31.75 9073817 32.01 32.17 31.61 4/3/2017 32.15 5842386 32.48 32.54 32.03 ######## 32.39 6709777 32.22 32.6 32.2 ######## 32.36 4124697 32.04 32.39 32.01 ######## 32.11 7239136 32.31 32.485 32.07 ######## 32.44 11092310 31.76 32.575 31.75 ######## 31.9 11019120 31.4 32.05 31.31 ######## 31.52 10347800 31.38 31.845 31.26 ######## 31.33 11218630 30.97 31.55 30.88 ######## 30.97 6600745 30.63 31.09 30.49 ######## 30.56 9621925 30.77 30.86 30.52 ######## 30.63 4093167 30.85 31.01 30.58 ######## 30.82 9432103 30.97 31.03 30.65 ######## 30.76 8728050 30.68 30.78 30.4235 ######## 30.74 4065243 30.67 30.92 30.44 ######## 30.52 4307008 30.49 30.61 30.32 ######## 30.5 5352321 30.4 30.55 30.35 ######## 30.55 7785827 30.83 30.92 30.4 3/9/2017 30.7 6664232 30.36 30.83 30.36 3/8/2017 30.35 4993331 30.59 30.6 30.325 3/7/2017 30.52 6833142 30.61 30.77 30.23 3/6/2017 30.69 10712340 30.41 31.03 30.29 3/3/2017 30.46 4421878 30.2 30.52 30.03 3/2/2017 30.2 6673690 30.33 30.48 30.09 3/1/2017 30.41 8348614 30.2 30.73 30.15 ######## 29.92 8553473 30.33 30.42 29.9 ######## 30.43 6475840 30.51 30.7 30.29 ######## 30.61 4930194 30.44 30.62 30.265 ######## 30.34 4368922 30.29 30.68 30.17 ######## 30.47 4596911 30.39 30.6 30.39 ######## 30.55 9231386 30.24 30.8 30.23 ######## 30.35 5004142 30.42 30.51 30.16 ######## 30.51 4313938 30.58 30.75 30.41 ######## 30.695 7597440 30.08 30.7 30.07 ######## 30.3 9255701 29.74 30.4 29.57 ######## 29.72 6974038 30.09 30.12 29.51 ######## 29.91 8876308 30.19 30.28 29.81 2/9/2017 30.12 5763891 30.35 30.51 30.09 2/8/2017 30.21 7847457 30.47 30.57 29.96 2/7/2017 30.48 15176580 30.85 31.46 30.14 2/6/2017 31.06 13619680 31.19 31.5 30.96 2/3/2017 31.4 7520499 31.46 31.62 31.16 2/2/2017 31.46 4731048 31.53 31.59 31.29 2/1/2017 31.62 7434994 31.32 31.75 31.315 ######## 31.38 5170287 31.2 31.44 31.05 ######## 31.37 9097541 31.28 31.4 30.995 ######## 31.29 8259271 31.14 31.52 30.88 ######## 31 12695620 30.43 31.61 30.31 ######## 30.3 5237626 30.37 30.44 30.17 ######## 30.28 4774302 30.11 30.3 29.89
  93. 93. Stock Charts • Stock graphs (sometimes referred to as OHLC or “open high low chart”) show the ups and downs in stock valuations over time. • Stock graphs are sometimes referred to as “OHLC” because the structure is as follows: identifier (whether stock or date or some other identifier), open, high, low, and close. • The open is the valuation of a stock at the open of the stock session. The high describes the highest value of the stock in the day-long trading period. The low refers to the lowest value of the stock in the trading period. The close defines the closing value in that time period. 93
  94. 94. Stock Charts(cont.) • The three examples were created from the online Nasdaq historical data site. Their “quotes” tab enables access to historical prices of stocks, and only recent datasets were used for the following: The Boeing Company (BA), Alphabet, Inc. (GOOG), and Tesla, Inc. (TSLA). • Because all the visualizations are from a single source and of a type, variance was introduced by variations in Excel for this graph type. • http://www.nasdaq.com/symbol/ba/historical • http://www.nasdaq.com/symbol/goog/historical • http://www.nasdaq.com/symbol/tsla/historical 94
  95. 95. 95 date open high low close 10:07 182.53 183.4 182.53 183.385 4/24/2017 181.75 182.46 181.61 182.06 4/21/2017 179.42 180.93 179.42 180.38 4/20/2017 178.74 179.82 178.346 179.3 4/19/2017 178.47 179.9 178.16 178.4 4/18/2017 178.59 179.97 177.12 177.85 4/17/2017 175.84 179.08 175.65 179.02 4/13/2017 175.74 176.88 175.5639 175.62 4/12/2017 178.25 178.25 175.94 176.05 4/11/2017 177.5 178.6 176.96 178.57 4/10/2017 179 179.97 177.48 177.56 4/7/2017 178.39 179.09 177.26 178.85 4/6/2017 177.56 178.22 177.12 177.37 4/5/2017 179 180.18 176.89 177.08 4/4/2017 176.88 178.79 176.76 178.7 4/3/2017 177.08 177.73 175.5 176.65 3/31/2017 178.02 178.18 176.81 176.86 3/30/2017 177.25 178.49 177.22 177.98 3/29/2017 177.4 177.99 176.97 177.63 3/28/2017 176.29 177.683 175 177.36 3/27/2017 174.37 176.44 173.75 176.1 3/24/2017 176.85 177.53 174.7687 175.82 3/23/2017 177.34 177.85 176.59 177.26 3/22/2017 175.96 177.1 175.5 176.98 3/21/2017 179.99 180.25 175.5 175.96 3/20/2017 180.1 180.15 179.14 179.39 3/17/2017 178.31 180.3835 178.17 180.1 3/16/2017 179.82 180 177.64 178.19 3/15/2017 179 179.24 177.97 178.71 3/14/2017 178.54 179.69 177.71 178.73 3/13/2017 177.16 179.19 177.07 179.05 3/10/2017 181.9 181.97 177.92 178.7 3/9/2017 181.43 182.59 179.58 180.57 3/8/2017 182.4 182.694 181.49 181.74 3/7/2017 180.64 182.84 180.6209 182.02 3/6/2017 181.4 182.3 180.43 180.93 3/3/2017 183.04 183.58 181.45 182.18 3/2/2017 184 185.71 182.97 182.99 3/1/2017 181.85 184.8 181.82 183.91 2/28/2017 180.34 181.93 179.67 180.23 2/27/2017 178 179.8839 177.55 179.43 2/24/2017 176.75 178.8 176.1 177.44 2/23/2017 175.97 177 175.7 176.86 2/22/2017 174.98 175.75 174.01 175.36 2/21/2017 173.92 176.17 173.68 175.56 2/17/2017 170.41 173.25 170.4 172.71 2/16/2017 169.64 170.81 168.75 170.81 2/15/2017 168.42 169.95 168.35 169.3 2/14/2017 167.7 168.8 167.22 168.5 2/13/2017 166.45 169.07 166.35 168.03 2/10/2017 165.25 166.45 164.47 166.23 2/9/2017 164.67 165.086 164.06 164.28 2/8/2017 165 165.24 163.69 163.81 2/7/2017 165 167.4199 164.8725 166.5 2/6/2017 162.42 164.08 162.38 163.98 2/3/2017 162.99 163.56 162.31 162.4 2/2/2017 163.22 163.97 160.82 162.26 2/1/2017 164.25 165.81 163.12 163.97 1/31/2017 164.96 165.1 163.22 163.42 1/30/2017 165.92 165.99 163.82 165.57 1/27/2017 169.21 169.8 167.01 167.7 1/26/2017 167.25 170 167.25 169.12 1/25/2017 163.59 168.65 163.24 167.36 1/24/2017 158.58 160.93 157.84 160.55
  96. 96. 96 date open high low close 10:24 865 867.5 862.81 866.64 4/24/2017 851.2 863.45 849.86 862.76 4/21/2017 842.88 843.88 840.6 843.19 4/20/2017 841.44 845.2 839.32 841.65 4/19/2017 839.79 842.22 836.29 838.21 4/18/2017 834.22 838.93 832.71 836.82 4/17/2017 825.01 837.75 824.47 837.17 4/13/2017 822.14 826.38 821.44 823.56 4/12/2017 821.93 826.66 821.02 824.32 4/11/2017 824.71 827.4267 817.0201 823.35 4/10/2017 825.39 829.35 823.77 824.73 4/7/2017 827.96 828.485 820.5127 824.67 4/6/2017 832.4 836.39 826.46 827.88 4/5/2017 835.51 842.45 830.72 831.41 4/4/2017 831.36 835.18 829.0363 834.57 4/3/2017 829.22 840.85 829.22 838.55 3/31/2017 828.97 831.64 827.39 829.56 3/30/2017 833.5 833.68 829 831.5 3/29/2017 825 832.765 822.3801 831.41 3/28/2017 820.41 825.99 814.027 820.92 3/27/2017 806.95 821.63 803.37 819.51 3/24/2017 820.08 821.93 808.89 814.43 3/23/2017 821 822.57 812.257 817.58 3/22/2017 831.91 835.55 827.1801 829.59 3/21/2017 851.4 853.5 829.02 830.46 3/20/2017 850.01 850.22 845.15 848.4 3/17/2017 851.61 853.4 847.11 852.12 3/16/2017 849.03 850.85 846.13 848.78 3/15/2017 847.59 848.63 840.77 847.2 3/14/2017 843.64 847.24 840.8 845.62 3/13/2017 844 848.685 843.25 845.54 3/10/2017 843.28 844.91 839.5 843.25 3/9/2017 836 842 834.21 838.68 3/8/2017 833.51 838.15 831.79 835.37 3/7/2017 827.4 833.41 826.52 831.91 3/6/2017 826.95 828.88 822.4 827.78 3/3/2017 830.56 831.36 825.751 829.08 3/2/2017 833.85 834.51 829.64 830.63 3/1/2017 828.85 836.255 827.26 835.24 2/28/2017 825.61 828.54 820.2 823.21 2/27/2017 824.55 830.5 824 829.28 2/24/2017 827.73 829 824.2 828.64 2/23/2017 830.12 832.46 822.88 831.33 2/22/2017 828.66 833.25 828.64 830.76 2/21/2017 828.66 833.45 828.35 831.66 2/17/2017 823.02 828.07 821.655 828.07 2/16/2017 819.93 824.4 818.98 824.16 2/15/2017 819.36 823 818.47 818.98 2/14/2017 819 823 816 820.45 2/13/2017 816 820.959 815.49 819.24 2/10/2017 811.7 815.25 809.78 813.67 2/9/2017 809.51 810.66 804.54 809.56 2/8/2017 807 811.84 803.1903 808.38 2/7/2017 803.99 810.5 801.78 806.97 2/6/2017 799.7 801.67 795.2501 801.34 2/3/2017 802.99 806 800.37 801.49 2/2/2017 793.8 802.7 792 798.53 2/1/2017 799.68 801.19 791.19 795.695 1/31/2017 796.86 801.25 790.52 796.79 1/30/2017 814.66 815.84 799.8 802.32 1/27/2017 834.71 841.95 820.44 823.31 1/26/2017 837.81 838 827.01 832.15 1/25/2017 829.62 835.77 825.06 835.67 1/24/2017 822.3 825.9 817.821 823.87
  97. 97. 97 date volume open high low close 10:39 1,740,869 308 309.25 305.86 309.06 4/24/2017 5077771 309.22 310.55 306.0215 308.03 4/21/2017 4501958 302 306.4 300.42 305.6 4/20/2017 6145961 306.51 309.15 300.23 302.51 4/19/2017 3891145 302.46 306.62 302.11 305.52 4/18/2017 3034225 299.7 300.8399 297.9 300.25 4/17/2017 4128067 302.7 304 298.68 301.44 4/13/2017 9275682 296.7 307.39 295.3 304 4/12/2017 6043648 306.34 308.4481 296.32 296.84 4/11/2017 5718053 313.38 313.47 305.5 308.71 4/10/2017 7653623 309.15 313.7299 308.71 312.39 4/7/2017 4566632 297.5 302.69 297.15 302.54 4/6/2017 5517731 296.88 301.94 294.1 298.7 4/5/2017 7858565 302.04 304.88 294.2 295 4/4/2017 10108230 296.89 304.81 294.53 303.7 4/3/2017 13864850 286.9 299 284.58 298.52 3/31/2017 3293698 278.73 279.68 276.3197 278.3 3/30/2017 4141437 278.04 282 277.21 277.92 3/29/2017 3672526 278.34 279.6 275.54 277.38 3/28/2017 7978665 277.02 280.68 275 277.45 3/27/2017 6221361 260.6 270.57 259.75 270.22 3/24/2017 5637668 255.7 263.89 255.01 263.16 3/23/2017 3309844 255.39 257.672 253.3 254.78 3/22/2017 4056735 251.56 255.07 250.51 255.01 3/21/2017 6901555 262.83 264.8 250.24 250.68 3/20/2017 3601616 260.6 264.55 258.821 261.92 3/17/2017 6491018 264 265.33 261.2 261.5 3/16/2017 7127180 262.4 265.75 259.06 262.05 3/15/2017 5233365 257 261 254.27 255.73 3/14/2017 7581719 246.11 258.12 246.02 258 3/13/2017 3011280 244.82 246.85 242.781 246.17 3/10/2017 3062785 246.21 246.5 243 243.69 3/9/2017 3876494 247.63 248.66 243 244.9 3/8/2017 3726746 247 250.07 245.32 246.87 3/7/2017 3452587 251.92 253.89 248.32 248.59 3/6/2017 3353601 247.91 251.7 247.51 251.21 3/3/2017 2925481 250.74 251.9 249 251.57 3/2/2017 3345751 249.71 253.28 248.27 250.48 3/1/2017 4804963 254.18 254.85 249.11 250.02 2/28/2017 6073890 244.19 251 243.9 249.99 2/27/2017 11450160 248.17 248.36 242.01 246.23 2/24/2017 8166869 252.66 258.25 250.2 257 2/23/2017 14877090 264 264.66 255.56 255.99 2/22/2017 8537811 280.31 283.45 272.6 273.51 2/21/2017 5647575 275.45 281.4 274.01 277.39 2/17/2017 6251469 265.8 272.89 264.15 272.23 2/16/2017 7063860 277.6 280 268.5 268.95 2/15/2017 4943879 280 282.24 276.44 279.76 2/14/2017 7341450 279.03 287.39 278.61 280.98 2/13/2017 7023072 270.74 280.7899 270.51 280.6 2/10/2017 3618336 269.79 270.95 266.11 269.23 2/9/2017 7812600 266.25 271.18 266.15 269.2 2/8/2017 3912428 257.35 263.36 256.2 262.08 2/7/2017 4244063 258.19 260 256.42 257.48 2/6/2017 3557600 251 257.82 250.63 257.77 2/3/2017 2185230 251.91 252.179 249.68 251.33 2/2/2017 2498799 248.34 252.42 247.71 251.55 2/1/2017 3953105 253.05 253.2 249.05 249.24 1/31/2017 4112013 249.24 255.89 247.7 251.93 1/30/2017 3798638 252.53 255.2899 247.1 250.63 1/27/2017 3161774 251.38 253 248.52 252.95 1/26/2017 3143717 254.29 255.74 250.75 252.51 1/25/2017 5145301 257.31 258.46 251.8 254.47 1/24/2017 4958144 250 254.8 249.65 254.61
  98. 98. Surface Charts • Surface graphs are 3-dimensional (3D) graphs with x, y, and z axes. • The setup for a surface graph requires some early data processing, not just three sets of data. • The assumption behind the data in a surface graph is that x and y are independent variables, and the values should be numeric. 98
  99. 99. Surface Charts(cont.) The data should be structured as a matrix or what some call a “mesh” because this information will be the underlying data behind the 3D contour. To build this mesh, the x-axis should be one row of data, and the y-axis should be one column of data. The z axis (which is the height of various points of the mesh) is drawn as an intersection between x and y (in the green area). Sparse matrices (those with a lot of empty cells or null values or zeroes) do not work as well as fully defined ones. Note that the referent in each cell has to be back to the y-column and the x-row ($column letter and $row number) 99
  100. 100. Surface Charts (cont.) • Surface charts enable the visualizing of some interaction between the data represented in the x-axis and the y-axis. • The colors of the surface chart (represented as bands) represent similar values. • 3D surface charts may be depicted as wireframe contours, aerial view contour charts, and others. • 3D surface charts may be viewed to see overall data patterns. They may be used to visualize equations. They may be used to find optimum combinations between two sets of data (represented on the x and y axes). 100
  101. 101. Surface Charts (cont.) • 3D visualizations are difficult for people to use because data may be occluded or difficult to see. • Data labels are important; legends are important. • The positioning of the visualization is important. • The labeling of the three axes is important, so people know what is represented. • Excel enables all the above. • The background behind the data and how the 3D data visualization was arrived at will be important to help users contextualize the visualization. 101
  102. 102. 102
  103. 103. 103
  104. 104. 104
  105. 105. 105
  106. 106. Data Structure for the Prior Four Surface Graphs (a selection of data) 106 308 309.22 302 306.51 302.46 299.7 302.7 296.7 306.34 313.38 309.06 $1.06 -$0.16 $7.06 $2.55 $6.60 $9.36 $6.36 $12.36 $2.72 -$4.32 308.03 $0.03 -$1.19 $6.03 $1.52 $5.57 $8.33 $5.33 $11.33 $1.69 -$5.35 305.6 -$2.40 -$3.62 $3.60 -$0.91 $3.14 $5.90 $2.90 $8.90 -$0.74 -$7.78 302.51 -$5.49 -$6.71 $0.51 -$4.00 $0.05 $2.81 -$0.19 $5.81 -$3.83 -$10.87 305.52 -$2.48 -$3.70 $3.52 -$0.99 $3.06 $5.82 $2.82 $8.82 -$0.82 -$7.86 300.25 -$7.75 -$8.97 -$1.75 -$6.26 -$2.21 $0.55 -$2.45 $3.55 -$6.09 -$13.13 301.44 -$6.56 -$7.78 -$0.56 -$5.07 -$1.02 $1.74 -$1.26 $4.74 -$4.90 -$11.94 304 -$4.00 -$5.22 $2.00 -$2.51 $1.54 $4.30 $1.30 $7.30 -$2.34 -$9.38 296.84 -$11.16 -$12.38 -$5.16 -$9.67 -$5.62 -$2.86 -$5.86 $0.14 -$9.50 -$16.54 308.71 $0.71 -$0.51 $6.71 $2.20 $6.25 $9.01 $6.01 $12.01 $2.37 -$4.67 312.39 $4.39 $3.17 $10.39 $5.88 $9.93 $12.69 $9.69 $15.69 $6.05 -$0.99 302.54 -$5.46 -$6.68 $0.54 -$3.97 $0.08 $2.84 -$0.16 $5.84 -$3.80 -$10.84 298.7 -$9.30 -$10.52 -$3.30 -$7.81 -$3.76 -$1.00 -$4.00 $2.00 -$7.64 -$14.68 295 -$13.00 -$14.22 -$7.00 -$11.51 -$7.46 -$4.70 -$7.70 -$1.70 -$11.34 -$18.38 303.7 -$4.30 -$5.52 $1.70 -$2.81 $1.24 $4.00 $1.00 $7.00 -$2.64 -$9.68 298.52 -$9.48 -$10.70 -$3.48 -$7.99 -$3.94 -$1.18 -$4.18 $1.82 -$7.82 -$14.86 278.3 -$29.70 -$30.92 -$23.70 -$28.21 -$24.16 -$21.40 -$24.40 -$18.40 -$28.04 -$35.08 277.92 -$30.08 -$31.30 -$24.08 -$28.59 -$24.54 -$21.78 -$24.78 -$18.78 -$28.42 -$35.46
  107. 107. Radar Charts • Radar graphs, also known as spider graphs / charts, show quantitative measures on axes emanating from a center point. • Each axis represents a variable. • In total, the radar graph represents a dataset on multi-variate features. • Radar graphs may be used to compare multiple underlying datasets, assuming that these are somehow comparable. 107
  108. 108. 108 insight cause discrep tentat certain differ 2.62 3.11 0.91 2.32 0.99 2.89
  109. 109. 109 affiliation achieve power reward risk 1.86 1.93 3.05 0.73 0.45
  110. 110. 110 see hear feel 0.61 0.35 0.21
  111. 111. 111 Analytic Clout Authenti c Tone Area chart - Wikipedi a.pdf 97.06 51.71 29.09 58.03 Bar chart - Wikipedi a.pdf 97.45 53.41 11.35 50.76 Box plot - Wikipedi a.pdf 98.42 47.61 6.90 27.01 Histogra m - Wikipedi a.pdf 97.67 48.28 7.49 42.60 Line chart - Wikipedi a.pdf 97.34 48.49 16.09 80.38 Line graph - Wikipedi a.pdf 96.79 46.55 3.46 37.85 Open- high-low- close chart - Wikipedi a.pdf 98.36 46.74 76.25 12.73 Pie chart - Wikipedi a.pdf 97.72 48.05 7.32 34.92 Radar chart - Wikipedi a.pdf 94.91 48.90 10.47 49.27 Scatter plot - Wikipedi a.pdf 97.06 51.44 6.61 52.82 Treemap ping - Wikipedi a.pdf 97.76 55.21 8.76 42.44 Waterfall chart - Wikipedi a.pdf 96.52 45.36 14.22 98.24
  112. 112. Treemap Charts • Treemap diagrams are rectangular diagrams which convey frequency in terms of spatial area of smaller rectangles fitted inside the space. • Treemap diagrams, if they include nested rectangles within the larger rectangles, are hierarchy charts because they capture the relationships of the higher vs. the lower levels. • By convention, the largest rectangles (indicating highest counts by category) are to the left, and the smallest are to the right. 112
  113. 113. 113 assignment 61385 graded_survey 1123 practice_quiz 2962 survey 896 66366
  114. 114. 114 Word Count 9465008123 5035 amazon 2916 2017 2861 https 2712 com 2063 just 873 like 783 get 771 1015484624218312 4 734 one 712 order 698 time 650 now 637 company 568 please 541 prime 489 amzn 474 www 473 day 458 1015525727629433 9 444 know 430 united 402 see 400 states 400 new 396 delivery 389 http 383 seattle 372 customer 369 status 364 even 360 sorry 359 retail 358 122 355 service 355 33207 350
  115. 115. 115 Very negative Moderately negative Moderately positive Very positive 1 : InternalsA mazon (@amazon) ~ Twitter 37 73 176 103
  116. 116. Sunburst Charts • Sunburst diagrams originated from piecharts. In sunburst diagrams, variables are depicted as portions of a circular ring. • Sunbursts are a form of hierarchical chart, which show upper and lower level interrelationships between elements, such as topics and sub-topics. • The elements closest inside the circle are the top-level topics. Farther out are the sub-topics, sub-sub-topics, and so on. (Or, some may prefer child topics, grandchild topics, great grandchild topics.) It’s the differentiation between the levels of information that makes this a hierarchical chart. 116
  117. 117. 117
  118. 118. Data Structure of the Sunburst Diagram in the Prior Slide Nodes Sub-nodes No. Coding References account account access 7 account account business days 4 account account details 3 account account info 2 account account information 9 account account issues 3 account account specialist 12 account account specialist email 3 account account today 1 account amazon associate account 3 account bank account 8 account checking account 1 account createspace account 2 account email account 26 Note the hierarchy with the “nodes” and “sub-nodes”. Note the alphabetization in both text (string) columns. Note the frequency counts in the “No. Coding References” column. 118
  119. 119. 119 Name Sources References beautiful 1 782 day 1 4 employment 1 8 event 1 8 everyone 1 4 flags 1 5 friendly reminder 1 12 good 1 256 great photos 1 175 holiday 1 14 holiday festivities 1 7 holiday lights 1 4 home 1 162 job 1 8 listing 1 7 morning 1 4 offices 1 10 online 1 5 photo 2 453 picture 1 384 place 1 414 post 1 13 road 1 328 state 2 184 state offices 1 8 sunset 1 183 today 1 16 town 1 203 trip 1 361
  120. 120. 120 ✔ ✔ apps 1 ✔ ✔ game 1 ✔ ✔ income jaction 1 ✔ ✔play store 1 delivery date estimated delivery date 8 delivery date false delivery date 2 delivery delivery persons 2 delivery delivery service 2 delivery delivery vehicle 2 delivery estimated delivery date 8 delivery fake delivery log 5 delivery false delivery date 2 delivery outsourced delivery 1 delivery perfect delivery performance 2 delivery poor delivery experience 1 gift gift cards 2 gift great client gift 1 mail mail box 2 mail mail room 3 mail provided prayer rooms 1 office apartments office 1 office post office 2 order confirmation e-mail order confirmation e-mail 11 order current order isnt 3 order order confirmation e-mail 11 order order status 1 service delivery service 2 service design services 2 service seller support service 1 shipping amzl shipping 2 shipping day shipping 1 shipping free shipping 1
  121. 121. Histogram Charts • Histogram charts shows the frequency distribution of numerical data over the comprehensive range of possible values. These are counts of how many times a certain score appears. • As such, they give a sense of the density of the data. • Histograms are generally applied to continuous data. For categorical data, regular bar charts with spaces between the bars are often used. 121
  122. 122. 122
  123. 123. 123
  124. 124. Data Structures for the Two Related Histograms in the Prior Two Slides 124 Bins Group Selfies Frequencies Bins Dronies Frequencies 0 - 5 7 0 - 5 3 6 - 12 63 6 - 12 21 13-19 20 13-19 9 20s - 30s 943 20s - 30s 168 40s 49 40s 10 50s 25 50s 7 60s 12 60s 2 70s ≥ 1 70s ≥ 4 Mixed 224 Mixed 116 Unknowable 27 Unknowable 185
  125. 125. 125
  126. 126. Data Structure for the Theme Histogram in the Prior Slide 126 A : compan y B : engine C : engineer ing D : landing E : launch F : mission G : pad H : real I : rocket J : space K : spacex L : stage M : station N : system O : test P : time Q : units R : vehicle S : work 1 : Int ern als (1) Spa ceX 43 66 34 39 142 30 37 29 105 101 51 37 31 49 50 35 27 34 43
  127. 127. Box & Whisker Charts • Box and whisker diagrams enable the visualization of groups of numerical data in quartiles (data broken into 25% or one-fourth segments). The boxes in the boxplots show the range of values in quartiles for that variable. • The whiskers—or the lines running from the boxes—show the variability outside the upper and lower quartiles. The longer the lines, the greater the variability above the quartile ranges. • The data mapped in box plots are not assumed to be parametric, so there is no assumption of underlying statistical distributions. • Lines within the boxes may indicate the median or midpoint where half the data is above and half the data is below. 127
  128. 128. Box & Whisker Charts(cont.) • Skewness shows what the tendency is so whether there are more scores that trend high or trend low. • A short box means low dispersion or spread (not a large variety in numbers)…while a long box means high dispersion or spread (a large variety of numbers). • Outliers are indicated as dots outside the boxes and on the whiskers. • The boxes in boxes & whisker diagrams may be vertical or horizontal. 128
  129. 129. 129
  130. 130. Data Structure for the Box & Whisker Plot in the Prior Slide (partial snippet) 130 YearStart YearEnd LocationA bbr LocationD esc Data_Valu e 2011 2011AL Alabama 32 2011 2011AL Alabama 32.3 2011 2011AL Alabama 31.8 2011 2011AL Alabama 33.6 2011 2011AL Alabama 32.8 2011 2011AL Alabama 33.8 2011 2011AL Alabama 26.4 2011 2011AL Alabama 16.3 2011 2011AL Alabama 35.2 2011 2011AL Alabama 35.5 2011 2011AL Alabama 38 2011 2011AL Alabama 36.4 2011 2011AL Alabama 27.1 2011 2011AL Alabama 38.5 2011 2011AL Alabama 34.8 2011 2011AL Alabama 35.8 2011 2011AL Alabama 32.3 2011 2011AL Alabama 34.1 2011 2011AL Alabama 28.8 2011 2011AL Alabama 23.8 2011 2011AL Alabama 29.8 2011 2011AL Alabama 40.1 2011 2011AL Alabama 28.6 2011 2011AL Alabama 2011 2011AL Alabama 2011 2011AL Alabama 32.9 2011 2011AL Alabama 27.8 2011 2011AL Alabama 2011 2011AL Alabama 34.7 2011 2011AL Alabama 39 2011 2011AL Alabama 30.5 2011 2011AL Alabama 33.2 2011 2011AL Alabama 34.1
  131. 131. 131
  132. 132. Data Structure for the Box & Whisker Plot in the Prior Slide (partial snippet) 132 Hospital Referral Region Descriptio n Total Discharges Average Covered Charges Average Total Payments Average Medicare Payments AL - Dothan 91 $32,963.07 $5,777.24 $4,763.73 AL - Birmingha m 14 $15,131.85 $5,787.57 $4,976.71 AL - Birmingha m 24 $37,560.37 $5,434.95 $4,453.79 AL - Birmingha m 25 $13,998.28 $5,417.56 $4,129.16 AL - Birmingha m 18 $31,633.27 $5,658.33 $4,851.44 AL - Montgom ery 67 $16,920.79 $6,653.80 $5,374.14 AL - Birmingha m 51 $11,977.13 $5,834.74 $4,761.41 AL - Birmingha m 32 $35,841.09 $8,031.12 $5,858.50 AL - Huntsville 135 $28,523.39 $6,113.38 $5,228.40 AL - Birmingha m 34 $75,233.38 $5,541.05 $4,386.94 AL - Birmingha m 14 $67,327.92 $5,461.57 $4,493.57 AL - Dothan 45 $39,607.28 $5,356.28 $4,408.20 AL - Birmingha m 43 $22,862.23 $5,374.65 $4,186.02 AL - Birmingha m 21 $31,110.85 $5,366.23 $4,376.23 AL - Mobile 15 $25,411.33 $5,282.93 $4,383.73 AL - Huntsville 27 $9,234.51 $5,676.55 $4,509.11 AL - Mobile 27 $15,895.85 $5,930.11 $3,972.85 AL - Tuscaloos a 31 $19,721.16 $6,192.54 $5,179.38 AL - Mobile 18 $10,710.88 $4,968.00 $3,898.88 AL - Birmingha m 33 $51,343.75 $5,996.00 $4,962.45 AL - Birmingha m 29 $55,219.31 $5,710.31 $4,471.68 AL - Mobile 66 $14,948.15 $5,550.90 $4,219.90 AL - Birmingha m 19 $73,846.21 $4,987.26 $3,944.42 AK - Anchorage 23 $34,805.13 $8,401.95 $6,413.78 AZ - Phoenix 11 $34,803.81 $7,768.90 $6,951.45 AZ - Tucson 40 $24,474.75 $6,799.85 $5,764.87 AZ - Phoenix 18 $28,571.61 $9,133.00 $8,008.11 AZ - Tucson 12 $35,968.50 $6,506.50 $5,379.83 AZ - Tucson 42 $26,294.52 $6,083.42 $4,903.33 AZ - Phoenix 28 $26,771.78 $7,140.85 $6,133.57 AZ - Phoenix 20 $29,967.80 $6,978.75 $5,969.55 AZ - Phoenix 15 $27,349.40 $11,026.33 $9,056.06 AZ - Phoenix 18 $59,443.83 $8,487.44 $7,422.66
  133. 133. Waterfall Charts • Waterfall diagrams (aka “flying bricks chart” or “Mario chart,” or “bridge” in finance) capture intermediate positive or negative valuations of something—such as products or services, housing, or stocks. • The x-axis may be time, or it may be a variable. • The y-axis is some sort of measure. • In some charts, the starting and ending values are shown as full bars, while the intermediate values float (as floating steps) to various heights depending on their varying values. • A waterfall chart may show valuation variance over time. 133
  134. 134. Waterfall Charts (cont.) • This graph displays “the cumulative effect of sequentially introduced positive or negative values” (“Waterfall chart,” Mar. 2017). • There is a non-naïve assumption that what has occurred before may have effects on the near-term on what follows (or is part of a larger affecting trend). • The depicted variables exist in a context and are in co-relationship. 134
  135. 135. 135
  136. 136. Data Structure for the Waterfall Chart in the Prior Slide 136 Base Fall Rise Total 4/24/2017 30.35 0 4/21/2017 30.7 0 30.7 0.35 4/20/2017 31.07 0 31.07 1 4/19/2017 30.39 30.39 0 -0.32 4/18/2017 30.66 0 30.66 0.25 4/17/2017 30.73 0 30.73 0.08 4/13/2017 30.43 30.43 0 -0.3 4/12/2017 30.65 0 30.65 0.22 4/11/2017 31.15 0 31.15 0.5 4/10/2017 31.24 0 31.24 0.09 4/7/2017 31.07 31.07 0 0.17
  137. 137. 137
  138. 138. Data Structure for the Waterfall Chart in the Prior Slide 138 Dates Base Fall Rise Total Changes 4/3/2017 3.95 0 0 0 4/4/2017 3.95 0 0 0 4/5/2017 3.75 0.02 0 -0.2 4/6/2017 3.75 0 0 0 4/7/2017 3.75 0 0 0 4/10/2017 3.8 0 0.05 0.05 4/11/2017 3.85 0 0.05 0.05 4/12/2017 3.7 0.15 0 -0.15 4/13/2017 3.45 0 0.25 0.25 4/17/2017 3.5 0 0.05 0.05 4/18/2017 3.4 0.1 0 -0.1 4/19/2017 3.4 0 0 0 4/20/2017 3.4 0 0 0 4/21/2017 3.5 0 0.1 0.1 4/24/2017 3.5 0 0 0 4/25/2017 3.4 0.1 0 -0.1 This one was made with the stacked vertical column chart feature. These are still not quite presenting correctly, but they’re close… The data is from the Nasdaq Historical Quotes tool.
  139. 139. Combo Chart • Combination graphs are those which mix data and present the findings in creative interlinked ways (optimally for new insights). • Combining data requires finesse because there are ways to introduce errors when mixing data. Data types may not align. Measures may not be accurately matched. Some data may be redundant. Etc. • There are many ways to create these. • Some of the earlier charts may be “combination” ones as well because of the integration of multiple variables and / or multiple datasets. 139
  140. 140. 140
  141. 141. Data Structure for the Combo Chart in the Prior Slide 141 function pronoun ppron i we you shehe they ipron article prep auxverb adverb conj negate Area chart - Wikipedi a.pdf 31.13 2.13 0.43 0.00 0.00 0.21 0.21 0.00 1.71 7.04 12.37 3.62 2.13 5.54 0.00 Bar chart - Wikipedi a.pdf 31.02 1.70 0.36 0.00 0.00 0.12 0.12 0.12 1.34 8.27 11.31 4.99 1.95 3.77 0.24 Box plot - Wikipedi a.pdf 33.08 1.79 0.90 0.07 0.00 0.07 0.00 0.75 0.90 9.86 12.17 4.33 1.34 4.26 0.45 Histogra m - Wikipedi a.pdf 31.79 2.45 0.23 0.00 0.10 0.03 0.00 0.10 2.22 9.14 11.16 4.34 1.79 3.94 0.33 Line chart - Wikipedi a.pdf 35.63 3.76 0.63 0.00 0.13 0.13 0.00 0.38 3.14 10.04 11.67 5.02 2.63 3.51 0.13 Line graph - Wikipedi a.pdf 33.69 3.52 0.41 0.05 0.04 0.04 0.02 0.27 3.11 8.56 11.60 4.86 1.72 4.07 0.52 Open- high-low- close chart - Wikipedi a.pdf 32.30 2.61 0.49 0.00 0.00 0.16 0.00 0.33 2.12 10.11 11.26 3.26 1.47 4.08 0.33 Pie chart - Wikipedi a.pdf 29.82 2.63 0.75 0.04 0.00 0.08 0.15 0.49 1.88 8.07 11.30 3.87 1.65 3.30 0.34 Radar chart - Wikipedi a.pdf 30.38 2.97 1.04 0.22 0.22 0.22 0.00 0.38 1.92 6.43 10.60 5.11 2.36 3.85 0.33 Scatter plot - Wikipedi a.pdf 34.01 2.95 0.86 0.07 0.14 0.07 0.43 0.14 2.09 9.51 11.02 5.26 1.37 4.61 0.36 Treemap ping - Wikipedi a.pdf 25.13 2.10 0.46 0.00 0.00 0.07 0.13 0.26 1.64 6.10 10.76 3.08 1.38 2.62 0.00 Waterfall chart - Wikipedi a.pdf 33.82 2.04 0.29 0.00 0.00 0.29 0.00 0.00 1.75 8.16 11.95 5.83 2.04 4.37 0.87
  142. 142. 3D Maps Geographical Imagery • The 3D Maps imagery is related to locational mapping on a digital 3D globe. • There should be at least one to two columns of locational information based on standard names for cities, states (or provinces), and countries. Regional names are also recognized. • The spellings of the names, though, should be standard to the tool. • There may be other columns of related quantitative data related to the respective locations. This may be time data, demographic data, or various other relevant information. 142
  143. 143. 3D Maps Geographical Imagery (cont.) • To set up data for 3D imagery, set up some locations: city, state/province, country, and say, years of residence. • Highlight the data. • Go to Insert - > 3D Maps • Adjust the fields for the look-and-feel. • The maps are interactive (rotate-able), and zoomable. 143
  144. 144. 144
  145. 145. Data Structure for the 3D Image in the Prior Slide City State Country Years of Residence 145
  146. 146. Some Tips for Creating Data Visualizations in Excel 2016 • Do a mental walk-through of the underlying data. • Consider what it is you want to communicate. • Create a number of versions of the data visualizations. Experiment broadly. • Add data visualization details. • Add surrounding information to ensure that the data visualization fits the context. 146
  147. 147. Going “Off-Script” within Excel Going with data visualization templates in Excel is a very fast way to portray structured data. However, there are some creative ways to re-visualize data in Excel by using existing capabilities. 147
  148. 148. (1) A Composite Multi-Graph Image • Let’s say that there is a need to create multiple graphs that are interrelated and need to be exported as one file. • Simply click on the outside borders of each of the elements, go to the Page Layout tab, and click on Group. This will treat all the elements as one group, and will enable clicking on just one part of the image to “copy” the entire one into a photo editing tool. • If the elements are not treated as one, then it will be difficult to export the composite graphs as one with a screen grab (since a screen may not contain the entire composite image). • Piecemeal copy-and-paste exports will mean that the elements have to be recomposed in a tool like Microsoft Visio, with the attendant challenges of getting everything to align. 148
  149. 149. (2) Back-to-Back Bar Charts • Begin with a set of relatively comparable data with the same variables being compared (with a numerical measure). • Assess the data with a shared measure. • In Excel, create two separate horizontal bar charts. • If the results are quite different, rework the horizontal axes to have the same maximum number (so the two sides have a comparable base). • Add data labels for clarity of the bars. 149
  150. 150. (2) Back-to-Back Bar Charts (cont.) • Create a name label for the data visualization using a text box. • For one of the two horizontal bar charts, in the “Format Axis,” reverse the order of the values. • For the one with reversed values, delete the vertical axis with the numbers. • Create a text box with the variables centered. • Strive to align the two bar charts. (This is easier said than done because the horizontal bars are not the same thickness necessarily if the numbers are quite different.) 150
  151. 151. (2) Back-to-Back Bar Charts (cont.) • Add a white background to the image, so that the Excel cells do not show up. • If further cleanup work is needed, drop the image into Photoshop or another image editing tool, and clean up the image before placing the image. • Once an Excel graph is made into an image, it is no longer machine readable and not screen-readable, so informationally-equivalent alt- text should be included to ride along with the image. 151
  152. 152. (2) A Rough Example of a Back-to-Back Bar Chart 152
  153. 153. (3) A Stacked Pyramid Chart • Create a list of frequency data. • Highlight the frequency data, and filter from largest to smallest. Be sure to extend the selection, so the data labels move with the correct frequency amounts. • Intersperse lines between each row, and put in a placeholder amount (say, 100 for the amount). • Highlight the data, and insert a 3D 100% stacked column chart. • Highlight the data columns and right-click. In the Format Data Series window, select “Full Pyramid.” 153
  154. 154. (3) A Stacked Pyramid Chart (cont.) • With the chart highlighted, go to the Design tab, and click “Switch Row/Column.” The separate columns will coalesce into one pyramid. • Click on the left axis (100% to 0%), and select “Format Axis.” In the “Format Axis” window at the right, select “Values in reverse order.” • In the chart area, select the visual elements which are not desired and click delete to remove any visual objects that are not desired. • Click the “plus” at the right of the chart and add elements that are desired (such as a Legend). • Adjust the size of the separators from 100 to another consistent number to create the sense of space between the reverse pyramid elements. 154
  155. 155. (3) A Stacked Pyramid Chart (cont.) • Right-click one of the placeholder layers in the visualization, and go to the Format Data Series window. In the “Fill” tab, select “No fill.” Do this for each of the placeholder layers to give a sense of physical distance between each of the actual data layers. • The “Enrollment Summary by College” data in the following table comes from the Office of the Registrar at Kansas State University, at http://www.k-state.edu/registrar/statistics/colleges.html. This is from 2016. • This data visualization type may align with sequential or pipeline data as well as others. 155
  156. 156. (3) A Stacked Pyramid Chart 156
  157. 157. Some Common Mistakes 157
  158. 158. Some Common Mistakes • Not ensuring that the underlying data behind a data visualization is correct • A lack of alignment and fit between the underlying data and the data visualization form • Going with a data visualization only because the software seems to enable it…but not working through the visualization to make sure that it makes sense both visually and data-wise • Confusing rates with actual measures, and others • Combining non-comparable data types • Having data in a cell which is not identified by accurate type (such as “date” information as “general” data or “number” information as “text” data) 158
  159. 159. Some Common Mistakes(cont.) • An incoherent data visualization enabling a wide variety of misinterpretations (or conflicting data in a data visualization) • Insufficient data visualization context • Poor labeling of data: insufficient labels, inaccurate labels, non- neutral language, illegibility, and / or others • Not spell checking data visualizations • Not studying the conventions of the data visualization • Assuming that viewers have the same level of background knowledge as the creator of the data visualization 159
  160. 160. Some Common Mistakes(cont.) • Excess data in the data visualization (such as extra decimal places for whole numbers for a lot of .00) • A 2D or 3D data visualization with excessive data and data element occlusion • A 4D data visualization with pacing that is too fast or too slow (or which does not enable viewer pacing or control) 160
  161. 161. One Main Realization • The work to conduct the research and to acquire the actual data takes about 95% of the effort and time, and drawing the data visualization takes about 5% of the effort…but the data visualization piece is also critical (because a lot can be compromised with improper drawing of the data visualization). 161
  162. 162. Adding Relevant Data Visualization Elements Data visualizations should be as simple as possible, with no extraneous elements that do not contribute to the overall meaning of the chart. 162
  163. 163. Common Data Visualization Elements • A clear noun-phrase title • Labels for the x- and y-axes (and sometimes y1 and y2 axes) • Data labels • Gridlines • A data table (for some data visualizations) • A legend • Error bars • Trendlines, and others 163
  164. 164. Graph Styles • Various style versions of the target graph • Background styles • Object handling • Texturing of objects and shapes • Font types and styles • 2D vs. 3D, and others 164
  165. 165. Range of Color Palettes • Ability to add a variety of colors in palettes that are aesthetically pleasing and of sufficient contrast for visual accessibility • Color palettes may be selected by dominant colors 165
  166. 166. To Change Graph Colors… • To change the colors of the plot, highlight the plot. • In the Design tab of the ribbon, select “Change Colors.” A dropdown menu will enable the selection from a variety of color palettes. The palettes are divided into two sections: colorful (polychromatic and contrastive) and monochromatic (different shades of a particular color, often in gradients). 166
  167. 167. To Select Custom Colors… • Custom colors may be applied to particular elements. Just right click the element, and change the fill color. 167
  168. 168. 168
  169. 169. Dropdown Menus with Additional Options • Users have a high level of control for the look, feel, and function of the chart / graph elements. 169
  170. 170. MS Excel’s Page Layout Features • Excel has a variety of layout features that may enable in-graph editing. • Some of the features of this tab include the following: • Pre-built themes • Backgrounds • Scaling and sizing • Gridlines • Arrangements (bringing forward, sending back • Auto alignment choices • Grouping, and others 170
  171. 171. Processing Graph Visualizations Outside of Excel 2016 171
  172. 172. 172 side-by-side data visualizations from different software tools
  173. 173. Several Main Ways to Export Excel Charts Copy and Paste as a Linked Graph • Can export data visualizations as a copy and paste (which will maintain the link to the original file—as long as all the respective files’ locations are not changed) • Copied and pasted charts will maintain an alpha channel behind visual elements (so there is an invisible layer with 100% transparency) • Colors of the data graphs will change in PowerPoint based on the applied design styles and color palettes Save as Template • Can export data templates for use later on 173
  174. 174. 174
  175. 175. Several Main Ways to Export Excel Charts (cont.) Copy and Paste as an Image into a Digital Image Editing Software Program • Can copy the graph by clicking on its outer edges, doing CTRL + C (to save the image to a Windows machine clipboard), and pasting into Adobe Photoshop…and changing the resolution, contrast, and aspect ratio as needed…and then exporting out / saving the image as a .png, .tif, .jpg, .gif, or some other Copy and Paste as an Image into a Diagramming / Drawing Software Program • Can copy the graph as an image into a diagramming / drawing software program (like Microsoft Visio) and adding image overlays before outputting in the proprietary file format and then as a digital image 175
  176. 176. Microsoft Visio • For example, MS Visio offers the following: pre-made templates, forms, containers, call-outs, connectors, and others • There are overlays of shapes, text boxes, lines, fonts, and others • Shapes may be highlighted and operations may be applied to them: union, combine, fragment, intersect, subtract, join, trim, and others…through an activate-able Developer tab • To offer more control, users have gridlines, drag-able guidelines, automated positioning and alignment, grouping features, aspect-ratio controls, and others • Color-based themes and variants 176
  177. 177. 177
  178. 178. Built-in Templates and Online Templates for Excel (for Defined Applications) 178
  179. 179. Add-ins to Excel 2016 179
  180. 180. 180 Year Variable 1 Variable 2 Variable 3 Variable 4 Variable 5 2010 100 8 100 30 180 2011 4 1 7 4 0 2012 0 8 5 200 -180 2013 7 1 5 10 0 2014 200 50 0 12 20 2015 3 0 5 40 0 2016 -400 20 150 3 200 2017 1 -80 45 100 4 2018 800 82 -800 82 600 faux data to display a streamgraph data visualization
  181. 181. 181
  182. 182. 182
  183. 183. 183
  184. 184. What are Add-ins? • Add-ins are software programs built to function with Excel to add various types of functionalities: data analytics, data visualizations, QR code generation, expanded export file types, and others • Add-ins / add-ons are helpful because they add functionality to a software that is already somewhat familiar 184
  185. 185. Where Can One Find Add-ins for Excel? • Some of the Excel add-ins are from Microsoft Research and may be activated within the tool. • Some are available from download from the Office Store (such as a free Streamgraph drawing add-on that creates area charts that vary over time). • Others are related to software programs (like Acrobat PDF) and enable richer ways to share / interchange file types. • Some are downloadable from CodePlex and GitHub (like Network Overview, Discovery and Exploration for Excel or “NodeXL”), for social media platform data extraction, network analysis, network graph drawing, and other capabilities. 185
  186. 186. Where Can One Find Add-ins for Excel? (cont.) • There are different directions for accessing different types of add-ins. • Some will require mere activation. • Those that are built into the tool will require mere activation, if that. • Those that come with other software programs will require mere activation, if that. • Some will require a download and some installation. • Some will require a download, but these may auto-installation. 186
  187. 187. Activating Add-ins • In Excel, click the File tab. • Click Options. The Excel Options window opens. • Click “Add-ins” in the left menu. • A list of available add-ins will display in the window, in several categories: • Active Application Add-ins • Inactive Application Add-ins • Document Related Add-ins • Disabled Application Add-ins 187
  188. 188. Activating Add-ins(cont.) • Select an add-in of interest, and click “Go” at the bottom. • An “Add-in” window will open allowing the user to check certain boxes to activate or to uncheck boxes to de-activate. • Click “OK” once the selections are decided. • These are global settings, and the add-ins should be good for future uses Excel. 188
  189. 189. Excel Options -> Add-ins Window 189
  190. 190. The “Add-ins” Window 190
  191. 191. A Note about Data 191
  192. 192. About Data • Data… • Has to be collected somewhere advertently or inadvertently • Has to be practically applied in some way (strategic, tactical, other) • May be pre-labeled or post-labeled • Structured data datasets include the following: • What a thing is (data type) and generally how it relates to everything else • Dataset metadata include the following: • How the data was collected (hopefully with high standards and finesse) • When the data was collected • Who collected the metadata and how they should be cited 192
  193. 193. About Data (cont.) • Dataset metadata may be captured in data dictionaries if the dataset is a larger sized one • The fact that data is in the same set means there is some relatedness whether you can see it or not (or you may have brought unrelated contents into a dataset and are seeing relations that may not exist) • Handling data requires finesse: • Data handling should be back-stopped by protected raw datasets which are left pristine and unprocessed (so researchers can always grab another set to process differently) • How you clean and handle it matters (handling can introduce artifacts, mistakes, and skews) • Researchers can’t afford to be sloppy or unthinking 193
  194. 194. About Data (cont.) • Having access to a data table or a dataset can give the deceptive sense of understanding • Data has to be understood from a deep background in the subject matter • Data has to be understood in the context of larger sets of data that may be cross-referenced expertly • Fragmentary data reveals in some cases and obfuscates in others 194
  195. 195. Data Visualization Standards 195
  196. 196. Some Common Standards for Data Visualizations • Data accuracy (underlying data; proper contextualization; source citations; disambiguation; correction of errors; non- manipulation of data consumers; differentiation between empirical, conceptual, and synthetic data) • Intellectual property protections (copyright) • Privacy protections (protection against re-identification of de- identified data) • Proper crediting of all sources • Accessibility through file versioning, alt texting, access to underlying databases, and captioning 196
  197. 197. Some Common Standards for Data Visualizations (cont.) • Human and machine readability of data tables • Contextualizing 197
  198. 198. Contact and Conclusion • Dr. Shalin Hai-Jew • iTAC • Kansas State University • 785-532-5262 • shalin@k-state.edu • Note: • The data sources have generally been cited close to the data visualization. • The presenter has no relationship to any of the software makers. 198

×