Creating Effective Data Visualizations for Online Learning

CREATING EFFECTIVE
DATA VISUALIZATIONS
FOR ONLINE LEARNING
4th Annual Big 12Teaching & Learning
Conference
@ TexasTech University
June 8 – 9, 2017
(updated)

PRESENTATION DESCRIPTION
• Virtually every type of online learning involves some type of data visualization.
Some common data visualizations include timelines, process diagrams,
linegraphs, bar charts, pie charts, treemap diagrams, dendrograms, cluster
diagrams, geographical maps, network graphs, word clouds, word networks,
scatter diagrams, scatterplot matrices, intensity matrices, decision trees, and
others. Indeed, there is also data in screenshots, photos, drawings, videos, or
other types of visuals. Online dashboards contain rich data visualizations to
convey dynamic data. Some data, such as big data, may only be conveyed in
visuals for human understanding and interpretation; in raw form, the meaning is
obscured and elusive. Data visualizations highlight salient aspects of data, and
they have to be aligned for particular multi-uses: (1) user awareness and
understanding, (2) data analytics, and (3) decision-making.
2

PRESENTATION DESCRIPTION
(CONT.)
• This session defines some best practices for informative and engaging data
visualizations for online learning. Original real-world examples are provided
from modern software programs.
3

OVERVIEW
• Oversimplifications about Data,
Information, and DataVisualizations
• DataVisualization Sampler (and Audience
Interpretations)
• Data asVisualization
• DataVisualizations in Online Learning
• Defining “Effective” DataVisualizations
• HumanVisual Perception
• CognitiveTheory of Multimedia Learning
• Steps to Creating DataVisualizations
• Conventions of DataVisualizations
• 2D
• 3D
• 4D
6

OVERVIEW (CONT.)
• Sequencing DataVisualizations
• Contextualizing DataVisualizations
• User Interactivity with the Data
Visualizations
• About DataVisualizations and Decision-
making
• About “Big Data” and DataVisualizations
• Some Quick Takeaways
• A Note about the Software
7

OVERSIMPLIFICATIONS ABOUT
DATA, INFORMATION,
AND DATA VISUALIZATION
8

ABOUT DATA
• Anything that contains raw information
• May be structured (labeled data, such as in data tables)
• May be unstructured or semi-structured (such as imagery, text, audio, video, mixed
media, and other non-traditional contents that contain informational value /
extractable meaning)
• Structured data tend to follow the basics of row data as individual records and
column data as attributes or variables
• Matrices may contain similar attributes in the columns (banners) and rows (stubs)
depending on the type of matrix
9

ABOUT DATA (CONT.)
• Raw master files of data should be kept
• Data are generally parsed in the following ways: classification, frequency (or
intensity), relationship [null, associational (negative or positive), causal, time
relation (slice-in-time, over time, discrete time or continuous time, predictive
or future-focused), and space relation], and others
10

ABOUT DATA (CONT.)
• Only as good as the sourcing and methods for collection
• Should be legally sourced and collected
• Should be accurately maintained and handled (with whatever levels of
confidentiality required)
• Is only “good” for a particular time (but also valuable in time for historical
purposes and base-lining and possible trend-line analysis)
• Is the base material on which research assertions and analyses are made
• May be grounds for fresh research hypothesizing
• May be a renewable resource in some circumstances and one-offs in others
11

ABOUT PUBLICLY SHARED DATASETS
• Full dataset may be shared online at the time of publication per grant funder
requirements and some practices in some domains
• Shared dataset needs to be properly documented in terms of sources and
methods and in terms of crediting
• This is often in a README file accompanying the data or included at the top-level of
the dataset as a text box
• Data limitations and qualifiers need to be acknowledged
12

(CONT.)
• Data need to be cleaned:
• Repeated information should be omitted
• Outliers should be deleted (or mitigated)
• Data norming should be applied so that the meanings of disparate terms may be
captured, and others
• Data may need to be re-structured for different types of data analytics in
different software programs
13

(CONT.)
• Shared dataset data need to be properly labeled; the data need to be
structured in conventional ways for ease-of-use and professionalism
• Columns as variables, rows as individual data entries
• Data need to be versioned in multiple formats for download and sharing
• Data need to be de-identified and made robust against re-identification
(avoidance of data leakage)
14

ABOUT INFORMATION
• Is an extraction from raw data, and is more processed (filtered, cleaned,
selective) than raw data
• Contains some interpretation and framing
• Contains applied value for human use and benefit for awareness, decision-
making, and other applications
• Should be accurate and avoid any sort of mis-representation, even by nuance
or false inference
15

ABOUT DATA VISUALIZATIONS
• Are a purposive and selective data summarization (of the underlying data), and
they generally contain particular dimensions or facets of the underlying data
• May be linked to the underlying data (for reproducibility)
• Involve titles, shape labels, callouts, data labels, keys / legends, and colors /
shading (to disambiguate the information)
• May involve moved data to avoid occlusion
• May include picture-in-picture layout
16

ABOUT DATA VISUALIZATIONS (CONT.)
• Include visual aesthetic style elements
• Lines, shapes
• Color palettes
• Backgrounds
• Fonts
• Are usually stand-alone but also may be used in an original context (so may
have dependencies)
• Follow particular data visualization conventions and common practices
17

• May be 2D (x- and y-axes), 3D (x, y, and z axes), and 4D (x and y axes and time;
x, y, and z axes and time as the 4th dimension)
• Should follow all laws
• Should respect intellectual property and not contravene IP rights
• Should also give credit where it is due
• Should respect privacy rights and not contravene privacy rights
• Should have legal and signed media releases for all depictions of people’s likenesses
• Should be accessible, with the information available in multiple modalities
18

• May be drawn from different sources:
• raw data: structured, unstructured, semi-
structured
• synthetic (faux) data
• processed information
• theory(ies)
• model(s)
• projection(s)
• concepts
• May be drawn from a combination of
sources
• The underlying sources and the visuals
inform understandings of the data
visualization and the confidence that may
be applied
19

• May be created in a number of ways:
• manually drawn with diagramming tools, note-taking tools, tablet drawing programs
and styluses
• may be pre-planned or drawn on-the-fly (spontaneously) in a freeform way;
• drawn by machine based on both data and various computer algorithms
• statistical analyses (correlations, chi-square test, simple regression, multiple
regression, t-tests,ANOVAs, sign tests, and others)
• cluster (similarity / dissimilarity) analysis
• machine learning or computational identification of patterns in data
20

• May be created in a number of ways: (cont.)
• drawn by computer program (cont.)
• agent-based modeling
• data modeling
• simulation
• virtual immersive worlds, and others
• and often created with a mixed sequence, such as some computational data
visualization augmented by manual data labels and other visual overlays
21

DATA VISUALIZATION SAMPLER
(AND AUDIENCE
INTERPRETATIONS)
22

WHAT DO THE FOLLOWING
DATA VISUALIZATIONS SHOW?
• The following data visualizations are based on education-seeded datasets and
various software programs.
• The data sources include the following: curated text sets, LMS data portal
data, social media datasets, crowd-sourced encyclopedias, non-consumptive
text analysis data, and others.
• The data visualizations are labeled by the following: (1) data, (2) data
visualization type, and (3) software technology.
23

GENERAL STEPS TO RESEARCH AND THE
ROLES OF DATA VISUALIZATION
24

Concept
Process Diagram
MicrosoftVisio
25

10 STEPS TO CREATING
DATA VISUALIZATIONS
26

Concept
Process Diagram
(sequential and recursive)
MicrosoftVisio
27

@USEDGOV TWEETSTREAM
@usedgovTweetstream
DataTable
(structured and semi-structured data)
NVivo 11 Plus
28

@usedgovTweetstream
Geographical Map
(with locational pins)
NVivo 11 Plus
29

@usedgovTweetstream
Sociogram / Sociograph
NVivo 11 Plus
(zoomed-in view) 30

@usedgovTweetstream
Dendrogram /Tree Diagram
(horizontal)
NVivo 11 Plus
31

@usedgovTweetstream
2D Cluster Diagram
NVivo 11 Plus
32

@usedgovTweetstream
Ring Lattice Graph /
Circle Diagram
NVivo 11 Plus
33

@usedgovTweetstream
3D Cluster Diagram
NVivo 11 Plus
34

@usedgovTweetstream
Bar Chart (tweets by month)
NVivo 11 Plus
35

@usedgovTweetstream
Pie Chart (by month)
NVivo 11 Plus
36

@usedgovTweetstream
Treemap Diagram
NVivo 11 Plus
37

@usedgovTweetstream
Sunburst Diagram
NVivo 11 Plus
38

@usedgovTweetstream
3D Bar Chart
NVivo 11 Plus
39

@usedgovTweetstream
Intensity Matrix
NVivo 11 Plus
40

@usedgovTweetstream
Node Structure
(of an auto-created codebook)
(hierarchical)
NVivo 11 Plus
41

@usedgovTweetstream
Treemap Diagram
(hierarchical)
NVivo 11 Plus
42

@usedgovTweetstream
Sunburst Diagram
(hierarchical)
NVivo 11 Plus
43

@usedgovTweetstream
3D Bar Chart
NVivo 11 Plus
44

@usedgovTweetstream
Word Cloud
NVivo 11 Plus
45

@usedgovTweetstream
Treemap Diagram
NVivo 11 Plus
46

@usedgovTweetstream
WordTree
(interactive)
NVivo 11 Plus
47

@usedgovTweetstream
Dendrogram /Tree Diagram
(vertical)
NVivo 11 Plus
48

@usedgovTweetstream and
@educationgovukTweetstream
(based on word similarity)
3D Cluster Diagram
NVivo 11 Plus
49

CONCEPTUAL AUDIENCES FOR GROUP
SELFIES AND DRONIES: NARROWCASTING
AND / OR BROADCASTING
50

Conceptual Audiences for Group Selfies
and Dronies from Social Media
Sunburst Diagram
MS Excel 2016
51

Conceptual Audiences for Group Selfies
(by frequency)
Treemap Diagram
MS Excel 2016
52

53
Multidimensional Image Analysis of
a “Dronie” Image Set from Google Images
Bar Chart with aTrendline
(bars in descending order)
MS Excel 2016

Group Selfies: Narrowcast and
Broadcast Messaging
Stacked Bar Chart
MS Excel 2016
54

Dronies: Narrowcast and
Broadcast Messaging
Stacked Bar Chart
MS Excel 2016
55

Zoomed-In, Mid-Zoom, and Zoomed-OutViews
(in group selfie and dronie image sets)
Doughnut
MS Excel 2016
56

ExtractedThemes from Group Selfies
and Dronies Image Sets
Spider / Radar Chart
MS Excel 2016
57

Narrowcasting vs. Broadcasting to
Conceptual Audiences
(social image set coding—by hand;
data coded by audience type and
frequency)
Area Chart
Excel 2016
58

HTTPS://EN.WIKIPEDIA.ORG/WIKI/EDUCATION ARTICLE
59

Education Article on Wikipedia
Spider / Radar Chart
LIWC2015 and MS Excel 2016
96.40
58.96
23.08
46.70 0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
Analytic
Clout
Authentic
Tone
Education Article onWikipedia
60

Treemap Diagram
61

Histogram
(based on function words)
62

TYPES OF POST-PRODUCTION FROM A
“GROUP SELFIES” VS. A “DRONIES” IMAGE SET
63

64
Post-Production Effects fromTwo Manually
Coded Image Sets: “Group Selfies” and “Dronies”
Back-to-Back Bar Chart
Excel 2016

EDUCATION, TRAINING, ONLINE LEARNING,
DISTANCE LEARNING /
GOOGLE BOOKS NGRAM VIEWER
65

“Education,Training, Online Learning, Distance Learning”
Line Chart / Line Graph
Google Books NgramViewer
(an online non-consumptive text analysis tool)
66

“EDUCATION” RELATED TAGS
NETWORK ON FLICKR (2 DEG.)
67

“Education” Seeding Term
RelatedTags Network on Flickr (1 deg.)
NodeXL
68

“Education” SeedingTerm
RelatedTags Network on Flickr (1.5 deg.)
NodeXL
69

“Education” SeedingTerm
RelatedTags Network on Flickr (2 deg.)
NodeXL
70

@USEDGOV USER NETWORK ON
TWITTER
71

@usedgov user network onTwitter
Social Network Graph
NodeXL
72

“EDUCATION” ARTICLE NETWORK ON
WIKIPEDIA (1DEG.)
73

Article Network on Wikipedia
Article-Article Network Graph (1 deg.) /
Grid Layout Algorithm
NodeXL
74

AN INFORMATION TECHNOLOGY (IT)
SATISFACTION SURVEY
75

An IT Satisfaction Survey
Cross-Tabulation Analysis
(with chi-squared scores, p-values)
Qualtrics
76

THIRD PARTY TOOLS ACTIVATED IN K-STATE
INSTANCE OF CANVAS LMS IN DESCENDING
ORDER
77

Third-PartyTools Activated in Canvas LMS at K-State
Statistic Chart / Pareto Chart (sorted histogram)
[items in descending order along x-axis; raw number counts (left)
and percentages (right) on the 2 y-axes]
(orange curve a cumulative aggregation of items comprising the set)
MS Excel 2016
78

Packed Bubble Diagram
Tableau Public
79

Treemap Diagram
Tableau Public
80

SELECTED POPULAR ACTIVATED THIRD-
PARTY SOFTWARE IN CANVAS LEARNING
MANAGEMENT SYSTEM (LMS) AT K-STATE
81

Canvas LMS Data Portal Data
Streamgraph (a time-based filled-area chart)
MS Excel 2016 with Streamgraph Add-on
(by Microsoft Research)
82

QUIZ POINT VALUES IN THE K-STATE
LMS
83

Scatterplot
MS Access and MS Excel 2016
84

COURSE WORKFLOW STATES IN THE K-
STATE CANVAS LMS INSTANCE
85

Doughnut
MS Excel 2016
86

VARIANT BILL-OUTS FOR SET OF
INSTRUCTIONAL DESIGN PROJECTS ON A
UNIVERSITY CAMPUS
87

Billing Data for Instructional Design Projects
Scattergraph with Lines
MS Excel 2016
88

Billing Data for Instructional Design Projects
Treemap
MS Excel 2016
89

TIME-TO-EVENT ANALYSIS (FORMERLY “SURVIVAL
ANALYSIS”) OF INSTRUCTIONAL DESIGN PROJECTS AND
TIME WHEN A PROJECT ACHIEVES EVENT (IS PAID OUT) OR
IS CENSORED (DOES NOT ACHIEVE EVENT DURING THE
RESEARCH PERIOD)
90

Instructional Design Billing Data
Kaplan-Meier Curve / Line Graph
(based on “survival analysis”)
IBM’s SPSS Statistics
91

Instructional Design Billing Data
Line Chart
(based on “survival analysis,”
non-descending stepwise curve)
IBM’s SPSS Statistics
92

MASTERY OF SOFTWARE
CONCEPTUALIZATION
93

Conceptual Way to Understand Whether One
Understands a Software Program
DecisionTree
MSVisio
94

“VIRUS ON A NETWORK” MODEL
[AGENT-BASED MODELING (WITH 4D)]
95

“Virus on a Network”
(by Stonedahl &Wilensky, 2008)
Agent-based Modeling
(over time, into 4D)
NetLogo
96

EDX.ORG ACADEMIC YEAR 2012 – 2013
(PUBLIC DATASET)
97

edX.org AcademicYear 2012-2013
(open and shared data source)
Map of MOOC Learner Locations of Origin
Tableau Public
98

PERSONAL DATASET
OF SOME CITIES VISITED
99

100
Some CitiesVisited over theYears
(by city, state, and country…
and numbers of visits)
3D Map (interactive, zoomable)
Microsoft Excel 2016 / Bing Maps

CLASSIC OPEN-SOURCE IRIS DATASET
101

Classic Iris Dataset
(open-source data set)
DecisionTree
(autocreated through machine learning)
RapidMiner Studio 102

EXAMPLE CUSTOMER DATASET
(AS PROVIDED IN RAPIDMINER STUDIO)
103

Example Synthetic Customer Dataset
DecisionTree
RapidMiner Studio
104

DATA
• What / Entity
• Frequency / Intensity (How Much?)
• Relationships (Association, Causation,
Hierarchical, and Others)
• Slice-in-Time
• Changes overTime
• Shape
• Size,Thickness, Height
• Connected Lines, Scatter in Space,Tree
Structure Diagrams, and Others
• Time Label,Time Indicator
• Line / Scatter over the X-axis
VISUALIZATION
DATA -> VISUALIZATION
106

DATA VISUALIZATIONS
IN ONLINE LEARNING
107

COMMON FORMS
• Timelines
• Bar Charts, Pie Charts, Line Charts, and
Others
• Models (Venn Diagrams, Figures)
• Geographical Maps
• Photos / Imagery
• Simulations
• 3D ImmersiveVirtual Worlds
• 4D ImmersiveVirtual Worlds
• Games
• Video
108

ONLINE LEARNING CONTEXTS
• Online learning includes both term-length courses and short courses (such as for trainings).
• In online learning, instructors maintain a level of telepresence through interactions and
intercommunications with learners.
• Learners maintain some level of inter-communications with their peers. They co-create
learning communities to support each other’s learning.
• In an online learning context, learners have to be somewhat self-driven and self-directed.
• Given that online learning occurs via theWeb and Internet, learners have easy access to online
resources: digital libraries, websites,immersive virtual spaces, online datasets, and other
contents.
• Depending on the sociability of others, they will have access to experts and peers to engage
with about various topics.
109

ONLINE LEARNING CONTEXTS (CONT.)
• The nature of the online learning context means that online learners will have
access to other datasets and data visualizations related to the same
information…and other perspectives and points-of-view.
• Ostensibly, they’ll be able to see if data visualizations are borrowed and reproduced
from elsewhere (through reverse image search, through basic Web image search).
• They’ll be able to access public datasets.
• They’ll be able to see if there are different datasets, data visualizations, and different
understandings and interpretations of the issue.
110

REQUIRED LEARNER RESPONSES
• The data visualization(s) need to be designed so that learners do the following:
• Pause, not just blitz past
• Engage with the visualization (and interact for the interactive visualizations)
• Extract accurate meaning for the learning
• Reflect
• Follow-through on learning activities
• Experience inspiration
111

DEFINING “EFFECTIVE”
DATA VISUALIZATIONS
112

EFFECTIVE DATA VISUALIZATIONS…
• represent the selected underlying data accurately based on the inherent form
and structures in the underlying data and on user needs (and control against
misperceptions and misunderstandings);
• highlight relevant aspects of the underlying data;
• convey information in an aesthetically pleasing way (to attract human attention
and to increase the memorability of the visualization and the underlying
information);
• align with conventions of the respective data visualizations (directionality of
reading, respective sizes of elements, placement of elements in relation to each
other, naming and labeling protocols, perspective, and other aspects);
113

EFFECTIVE DATA VISUALIZATIONS…
(CONT.)
• maintain consistency both within and across related data visualizations;
• are accessible in terms of element labeling, text readability, image resolution,
and uses of color [proper contrast, proper color palettes, applied fill, and
way(s) to convey information beyond color];
• are presented in a contextualized way, including access to information about
the underlying research, data collection, and data cleaning;
• avoid unnecessary (read: purely decorative, non-information-bearing)
elements, and
• occasionally connect to the underlying data (data portals, interactive web-
based data visualizations), among others.
114

SOME MECHANICS OF
VISUAL PERCEPTION
• The human visual perception system includes the eyes (cornea, lens, and retina), the
optic nerves, and visual paths in the brain to process light information.
• The retina contains 150 million light-sensitive rod and cone cells
• In the brain, there are hundreds of millions of neurons that process visual information (“and
take up about 30 percent of the cortex, as compared with 8 percent for touch and just 3
percent for hearing”)
• Optic nerves consist of “a million fibers” each (Grady, June 1, 1993,“TheVisionThing:
Mainly in the Brain,” Discover)
• Based on the eye’s structure, its focal vision is powerful, but peripheral vision is
very limited.
116

PRE-ATTENTIVE PROCESSING
• The human visual perceptual system captures initial visual information in a pre-
attentive and subconscious way initially (“Pre-attentive processing,” Nov. 29,
2016).
• Based on interest and training, a person may then focus attentively on the visual
stimulus.
• Subconsciously and unconsciously acquired details of the world can affect the person
and his / her decision-making whether he / she is consciously aware of the details or
not.
117

VISUAL SIGNALS BEYOND THE PHYSICAL
• Perceptual signals do not only come from the world but also from the mind
and body (internally).
• Vision, though, is informed by the prior experiences (prior observed patterns)
of the individual.
• One researcher suggests that in visual perception: 40% comes from visual signals, and
60% comes from prior experiences and memory (Catmull, 2014, Creativity, Inc.:
Overcoming the Unseen Forces that Stand in theWay ofTrue Inspiration, p. 178).
118

EIDETIC MEMORY?
• “Eidetic” memory refers to the ability to recall mental images with high detail.
• Some people, particularly a subgroup of children, are able to view memories
like photos for some minutes.
• Photographic memory, though, has not been established empirically and is not
currently thought to exist.
119

A VISUALLY DETAILED AND
INFORMATIVE WORLD
• Human visual imagistic representations of the world (in the mind) are not that
inherently informative.
• Human visual memory seems so powerful because the world itself serves as an
“outside memory” (O’Regan, Sept. 1992).
• A common eloquent expression of this idea is that “the world is its own memory.”
120

VISUAL THINKING
• “Visual thinking” refers to human intelligence and imagination which enables
people to conceptualize in imagery, not just language.
• “Visual literacy” refers to the ability to discover meaning from imagery.
• There is research that people interpret artworks in a predictable manner, even
across “a wide range of cultural and socioeconomic contexts” (Housen, 1992a,
2000, 2002; Housen, DeSantis, & Duke, 1997, as cited in Housen, 2007, p. 2),
which may suggest a hard-wired biological basis.
• The stages are as follows:
• (1) “accountive” with “simple, concrete observations”;
121

VISUAL THINKING (CONT.)
• The stages are as follows (cont.):
• (2) “constructive” based on perceptions,“knowledge of the natural world,” “the values
of their social and moral world,” with observations based on known reference points;
• (3) “classifying” with viewers acting as “art historian” by placing the artwork in a
context of conventions and art history canons;
• (4) “interpretive” based on “interactive and spontaneous” encounters with the
artwork, and
• (5) “re-creative” by reflecting about art and suspending belief in order to see the
work as “semblant, real, and animated with a life of its own” (Housen, 2007, pp. 3 – 8)
122

PRIOR EXPERIENCES
WITH DATA VISUALIZATIONS
• If people’s visual systems are trained by the human built environment and their
exposure to familiar forms, so, too, are people’s systems trained by prior exposures
to data visualizations.
• Some common expectations for data visualizations:
• Start at the top and read down. Start at the left and read right.
• Size means visual salience and importance.
• Color and boldness means visual salience and importance. Bright colors are warning colors.
• Movements (changing numbers, scrolling data, and others) are attention-getting.
• Eye movements often track with shapes and lines.
123

SOME IMPLICATIONS
FOR DATA VISUALIZATIONS
• Data visualization conventions should be followed.
• Human tendencies to read stories and meanings into every element of a data
visualization should be understood and supported. This means that no excess or
misleading information should be included.
• The human eyes’ capabilities to detect nuance should be catered to. It may be
helpful to add gridlines and other details to enhance understanding of a graph.
• Whatever visual elements in a data visualization should work together
harmoniously, and they should not clash or engage competitively for human
attention.
• All measures should be consistently applied across the data visualization.
124

COGNITIVE THEORY OF
MULTIMEDIA LEARNING
125

MAIN THEORISTS AND THEORIES
• Richard Mayer’s CognitiveTheory of Multimedia Learning (2002):
Engaging cognitively involves costs to the learner.
• (1) Intrinsic cognitive load is related to the difficulty of the topic-to-be-learned.
• (2) Extraneous cognitive load is based on how information is designed and
presented.
• (3) Germane cognitive load is dependent on “the processing, construction and
automation of schemas” (schemas being frameworks for understanding parts of the
world). There are ways to design multimedia to align with human cognitive limits to
lighten cognitive loads to enhance learning.
126

MAIN THEORISTS AND THEORIES (CONT.)
• John Sweller’s Cognitive LoadTheory (1988): “Means-ends analysis”
requires high cognitive load on people, and those who teach can lighten the
load for learners by offering organizing schemas and “worked-examples” and
“goal-free problems.”
• Allan Paivio’s Dual-CodingTheory (1960s / 1971): Humans process
information through separate auditory and visual channels. Verbal (word,
symbolic) and non-verbal (visual image) information is processed in different
channels.
127

IMPLICATIONS ON DATA
VISUALIZATION DESIGN
• CognitiveTheory of Multimedia Learning
• Complex topics should be unpacked in a clear way to limit intrinsic cognitive load.
“Extraneous cognitive load” should be avoided through effective design.
• Data visualizations should never be decorative alone (because these may be
distractive).
• Data visualizations should have main relevant aspects highlighted and noted, to lower
germane cognitive load. Learners should not be given confounding data visualizations
without clear meanings.
• Cognitive scaffolding should be designed into the data visualizations about topics with
high intrinsic cognitive load.
128

VISUALIZATION DESIGN (CONT.)
• Cognitive LoadTheory
• Data visualizations should be placed in the context of a relevant framework in a
particular learning domain or context. The data should be presented in the context
of accepted schemas.
• Ambiguity requires cognitive load to process, so if learners need to apprehend a data
visualization right away, it should be presented as a “worked case.” Problems, when
presented, should be “goal free” and pre-solved in many cases.
129

VISUALIZATION DESIGN (CONT.)
• Dual-CodingTheory
• Data visualizations presented to learners should not only be on purely verbal or non-
verbal channels.
• There should be a balance in the modality of information, so learners can process the
information appropriately.
• There are contested ideas about how much redundancy across channels should be
deployed to convey information. Too little coding may leave the learner with
insufficient information; excessive redundancy may cause expensive cognitive overload
with unnecessary excess.
130

STEPS TO CREATING
DATA VISUALIZATIONS
131

REVIEW: 10 STEPS TO CREATING DATA
VISUALIZATIONS
1. Analyze the data
2. Clean / process the data
3. Select the data aspect(s) to highlight
4. Structure the data for the visualization
5. Create initial data visualizations
6. Analyze the data further
7. Add data labels, title, key / legend, and other elements
8. Pilot-test the data visualizations (stand-alone)
9. Pilot-test the data visualizations (in context)
10. Finalize the data visualizations
(as seen on Slide 25)
132

DEBRIEFING THE SEQUENCE
• A data visualization begins with intimate
knowledge of the underlying data.
• Data often has to be processed in the
correct format for visualization.
• Data visualizations are used partially as a
data exploration method.
• Data are often processed in multiple
different methods…and even in multiple
different software programs in order to
see what may be learned from the data.
• Depending on aesthetics, some may
process data in one tool and export the
resulting data tables and / or other digital
artifacts for final processing in other
software programs.
• There are data visualization drafts
created before a final one is output (for
presentation).
• Data visualizations have to be human-
readable and human-usable.
133

MORE TO THE STORY…
• To create relevant data visualizations, those who would design data visualizations
need to understand the following:
• the underlying data and prior research
• the statistical assumptions
• the conventions of the particular data visualizations
• the target audiences (and the incidental audiences)
• the socio-cultural and geographical backgrounds of the target and incidental audiences (in
order to avoid miscommunications and potential offense)
• the requirements (color processing, resolution, and others) and technical versions of the
imagery needed for digital distribution and print
134

RESEARCH STANDARDS…
• Following legal standards for research
and data collection, including professional
oversight, informed consent, candor,
benevolence, and others
• Following legal standards for data
handling and storage
• Following legal standards for privacy
protections of research participants (and
data)
• Following legal standards for information
accuracy (and controlling for negative
understandings)
• Following professional guidelines for
integration of mixed data from various
datasets
• Minimizing interpretive skew
QUALITY STANDARDS FOR DATA
VISUALIZATIONS
135

HISTORICAL ACCURACY
• Ensuring that the research work was as
solid as possible given the
contemporaneous limits of time, talent,
treasure, technologies, and methods
• Ensuring that the data may be used in the
future, based on future-created
capabilities
• Using data and data visualizations in
ethical and professional ways
• Providing benefit in the deployment of
data and data visualizations
• Avoiding harm in the deployment of data
and data visualizations
• Providing full disclosure in the provision
of information
PROFESSIONAL USE
136
VISUALIZATIONS (CONT.)

INTELLECTUAL PROPERTY (IP)
• Creating original contents using materials and
data that one has legal rights to use
• Using software that is legally acquired
• Giving credit where it is due (such as in cases of
open-source and / or Creative Commons-
released materials)
• Avoiding contravening others’ intellectual
property
• Doing due diligence to identify ownership of
works (even for “orphaned” works)
• Acquiring informed consent from all participants
in research (and maintaining accurate and up-to-
date documentation of these permissions)
• Acquiring media releases for uses of people’s
likenesses (such as for audio, video, and other
recordings and captures)
• Protecting data (both in transit and at rest)
• De-identifying data where necessary (to the
standard that re-identification is not possible)
PRIVACY PROTECTIONS
137

ACCESSIBILITY…
• Ensuring that all data visualizations are
available to users in multi-modal channels
(visual, textual / audio)
• Channels should offer equal informational
value
• Ensuring that 4D data visualizations (with
the time element) may be controlled by
users (some the timing may be slowed or
stopped, for easier usage)
• Ensuring that data tables may be read
coherently by screen readers
• Ensuring that color is not used as the
only channel for information conveyance
(for those with color-blindness)
• Using high-contrast colors to enable
accurate visual uptake of information, and
others
138

REPRODUCIBILITY
• Enabling access to the underlying data
behind data visualizations
• Enabling contemporary and future
researchers to explore the data for
accuracy and applicability to other
contexts (and through other interpretive
lenses)
• Enabling multiuse data
• Enabling other researchers to go through
the same steps as the original researcher
to come out with the same results from
the dataset(s)
REPEATABILITY
139

ADDITIONAL COMMON ERRORS IN
DATA VISUALIZATIONS
The Data
• Introducing error in data handling, data processing, and / or data cleaning
• People who work too quickly will accidentally delete information or corrupt it if they
are non-thinking in their work
• Using an unaligned data visualization type for the underlying data
• It’s easy to get a software program to output a data visualization without actually
understanding what is going on with the data or in the software
140

DATA VISUALIZATIONS (CONT.)
The Data (cont.)
• Using high-density data that may overpower the data visualization
• Excessive nodes in a network will make the network unreadable
• Insufficient understanding of the limits of assertions that may be made with
that visualization
• Not remembering that data visualizations are summary data, not comprehensive (in
most cases)
• Not remembering that data visualizations are inherently ambiguous and polysemous
(multi-meaninged) and can be interpreted in different ways by different beholders
141

The Data (cont.)
• Labeling data visualization elements incorrectly
• Insufficient labeling of visual elements (such as data labels) in the data visualization
• Incorrect labeling of data elements (confusing rates over time with set amounts)
• Using mixed measures in data
• Not using consistent time measures
142

The Data (cont.)
• Not considering language
• Simple English reads better and translates better
• Parallel construction should be applied to all language use in a data visualization
especially since language is so sparse and powerful in a data visualization
• Spelling should be correct
• The language used should be consistent and aligned with the research context
143

The Software
• Using software without understanding the software
• Researchers will use software programs without reading the manuals and the underlying
documentation (or they’ll go to forums before they go to the actual documentation)
• Of course, some software makers do not document as well as they should (most will not
reveal underlying algorithms, for example)
• Researchers need to understand the software programs they’re using, particularly for
coding and analysis
• They need to represent what they learned while using the software, not just mention that
they used the software (as if that would lend their work credibility)
• Shabby work reads as shabby, and name-dropping a software tool will not make things
better
144

DataVisualization Conventions
• Not understanding the conventions of a data visualization
• Spatial relationships in 2D, 3D, and 4D planes
• Oftentimes, people will combine 3D and 2D, breaking the illusion of the z-axis
• Shapes and meanings
• Color applications
• Lines: thickness,color, interruptions, line ends, and others
• Symbology and iconography
• Textures and patterns
145

Contextual Details
• Not offering sufficient contextual details to fully understand a data visualization
• Not including data parameters for data processing in a data visualization
• Not indicating that a data visualization is conceptual vs. empirical
• Not labeling synthetic or faux data as such
• Misrepresentations of information
146

Going Glam / Not Going Glam
• Using data visualizations that are glamorous (read: 3D) but which
misrepresent data
• Misplacement of data on the x, y, or z axes
• Occlusion of visual data
• Not considering aesthetics
• Using mixed color palettes (or using colors without any consistency or strategy)
• Using poor aspect ratio (stretching data visualizations)
• Not designing for white space (by overloading a data visualization)
147

Audience Needs
• Not fully considering audience needs [such as by running pilot tests; such as
building for both expert and general audiences (simultaneously)]
• Visual perception needs; cognitive and symbolic processing needs (symbols, language);
accessibility needs
• Learner developmental stage needs (with implications for the data visualization and
the sequence of related data visualizations)
• Informational needs
• Technological needs (such as viewing data visualizations on mobile devices and
smartphones with small screens)
148

Designing for Usage Contexts
• Incomplete consideration of various contexts in which the data visualizations
may be used
• Insufficient consideration for both stand-alone (the disaggregation of elements in
online learning) and in-context usages of the data visualization
• And others
149

TEMPLATING
• For those working on projects, it helps to…
• define comprehensive data visualization
standards early in a project stylebook
• use prototypes of data visualizations and
images and test these with people who are
similar to those who will ultimately consume
the data visualizations
• create evolving data visualization templates for
use during the lifespan of the project
• It is important to keep clear
documentation of all work and how the
data visualizations were created
• It is important to keep all raw files,
especially data ones, for re-do’s as
needed
CLEAR DOCUMENTATION
AND STORAGE OF RAW FILES
QUALITY APPROACHES
150

CONVENTIONS
OF DATA VISUALIZATIONS
151

SOME COMMON DATA VISUALIZATION
CONVENTIONS
• There are senses that there are optimal amounts of data for a particular data
visualization. Excessive data makes a data visualization hard-to-read or confusing;
too sparse data makes a data visualization feel incomplete.
• Data visualizations are generally read from top-to-bottom and left-to-right.
• Timelines are read either from top-to-bottom or left-to-right, for example.
• In hierarchical data visualizations, there are typical ways to interact with them, from general
to specific or specific to general.
• Sunburst diagrams are generally read from general to specific.
• Dendrograms are generally read from leaf to branch to trunk to root (specific to
general).
152

CONVENTIONS (CONT.)
• If there is a sequence of data visualizations, these are usually presented from
simple to complex.
• The opposite sequence can also be applied.
• In linear regressions, the x-axis is usually time, and the y-axis is the variable.
Or, both axes can be variables.
• Data may be engaged with with varying levels of granularity. Less specific data
may not be labeled, but at finer levels of granularity, data labels are often used.
• Data tables may be published out with the data visualization because of the
imprecision of human vision in assessing finer distinctions in visualizations.
153

CONVENTIONS (CONT.)
• Time is often an important part of a data visualization, whether time is treated
as discrete (slice-in-time), periodic (in phases), or as continuous. Time is an
important variable in all research.
• Data visualizations are usually named (titled) for easier reference.
• Titles are generally factual and descriptive.
• They are typically in the form of noun phrases.
• Some titles point to the main gist of the data visualization.
• Titles are usually written in title case, with all main words capitalized and prepositions
and articles in lower case.
154

CONVENTIONS (CONT.)
• Some data visualizations are offered along with underlying datasets that inform the
data visualization—for reproducibility of the data visualization (and for enriched
research using the shared data).
• In some cases, datasets are offered along with the R or other high-level computer language
script used for the data visualizations, so users may experience the data visualizations in
interactive ways.
• If external data are used, the data source should be cited. (Many publicly shared
datasets come with desired citations. Some of these may have to be tweaked to
follow the proper citation method of the target publisher.)
• If external data are processed or intermingled with other data, that should be done
with finesse (so as not to corrupt the data). How that was done should be clearly
documented.
155

CONVENTIONS (CONT.)
• Aesthetically, data visualizations are created with a proper balance of filled-in spaces
and white spaces.
• Borders and edges are not usually included but site designers and publication
production personnel decide applications during the design process.
• Color palettes are deployed for both aesthetics and for accessibility.
• Color palettes may be polychromatic (multi-colored) or mono-chromatic (one color but
different shades).
• Sufficient color selection (avoiding colors that are imperceptible for those with color
blindness challenges) and proper contrast are important for visual accessibility.
• Color should never be the only conduit for information; labels should be used strategically
to convey meaning.
• Color (arrayed along the light spectrum) affects viewer perceptions,including their moods.
• Cultural backgrounds may also affect the inherent meaning in colors.
156

CONVENTIONS (CONT.)
• Most data visualizations use only one or two font styles.
• Font sizes tend to be within a certain size range, so that there are not huge
differences in sizes, particularly for shared and similar types of data.
• Texts in data visualizations are hierarchical and structured (even though they’re not
generally tagged within the data visualization currently).
• The location (position) of the text may be indicative of its importance.
• The larger the font, the more important the data.
• The font sizes of titles may be quite a bit larger than other font sizes used in a
data visualization because of its central role in the visualization.
160

SOME COMMON DATA
VISUALIZATION
CONVENTIONS (CONT.)
Font types in data visualizations tend to be sans
serif for easier readability.
Whenever possible, it’s a good idea to have
labeling text right-side-up for readability.
In some rare cases, it is allowable to have text
in various directions, but this is not generally
desirable.
(Note: Most software programs that enable data
visualizations will have some helpful presets. Data
designers break the presets at their own risk.)
161

CONVENTIONS (CONT.)
• More complex data may be represented in interactive data visualizations.
• For example, data visualizations in software programs, data dashboards, and some websites may
offer the ability to access the underlying data.
• Some simulations enable the input of different data parameters in order to see how these differing
inputs can affect outputs.
• When interactive data visualizations are exported as static images, there is always capability
loss and related data loss. (To preserve the information, it would help to export multiple static
images along with discussion points.)
• Data visualizations should be optimized to the various modes of usage—in print, in screen, and
so on. To these ends, these should be versioned for proper resolution, color type, file types,
and so on. For many print publications, only b/w or grayscale are applied to data visualizations
(because of the prohibitive costs of use various ink colors).
162

CONVENTIONS (CONT.)
• Data visualizations may be used in whole or in part.
• Data visualizations may be designed for audience needs in two ways:
• To fulfill the needs of a narrow audience focused on specific information from the
data
• To fulfill the needs of a broad audience based on a wide range of user needs from the
data
163

CONVENTIONS (CONT.)
• In publication, data visualizations are not usually bylined within-image.
• Bylines may be given in the publication, unless the author of the work created
the data visualizations.
• All datasets and data visualizations have limitations, and it is better to have
such limitations addressed as part of the publication process.
• This is addressed in the “delimitations” section. Data qualifiers should be included
with the data visualization. The level of confidence linked to the underlying data and
data visualizations should be addressed.
164

BASICS ABOUT 2D DATA
VISUALIZATIONS
• 2D data visualizations exist on a flat two-dimensional plane.
• The planes are usually squares or rectangles (quadrilaterals). Within the area,
various types of data visualizations may be displayed.
• Generally 2D data visualizations are understood to have an x-axis and a y-axis
(such as linear regression graphs, bar charts, and others). In some cases, the x-
and y- axes do not apply since the visualization may be rotated and maintain
the same meaning (such as some forms of network graphs, bubble diagrams,
and others).
166

VISUALIZATIONS
• Three-dimensional (3D) data visualizations are drawn on a space that involves
more than area (also volume) and three dimensions: x-axes, y-axes, and z-axes.
• In many software tools, the 3D effect is created with shading and the
appearance (illusion) of a third dimension.
• Such visualizations tend to be rotate-able and zoomable for clarity.
• People are not thought to process 3D data very well because of challenges
with occlusion and visual ambiguity.
• Often, 3D visualizations may also be offered in 2D.
168

VISUALIZATIONS
• The fourth dimension is conceptualized as time. For data visualizations, this
means changes over time.
• Changes over time may be seen in spaces that are two-dimensional or three-
dimensional.
• Time may be discrete (a particular slice-in-time), phased (into periods), or
continuous.
• Time may be presented in sequential order or reverse-sequential order, in terms of
phased or continuous time.
• In data visualizations, time may be run forwards or backwards in simulations,
virtual immersive worlds, and video.
170

SEQUENCING
DATA VISUALIZATIONS
171

ORDER AT THE MICRO-LEVEL
• In processing data for a data visualization, there are common micro-level
organizational aspects. They may include the following:
• alphabetization (letter order)
• numerical order (positioning in a list, rank, others)
• simple to complex, complex to simple (complexity)
• smallest to largest, largest to smallest (size)
• chronological date order, reverse chronological data order (date)
• top to bottom, bottom to top, left to right, right to left, outside in, inside out (spatial)
• most to least, least to most (amount)
• categorization (type)
• These ideas apply to the organization of data visualizations as well, in terms of
providing guidance on how such visualizations may be sequenced. 172

SOME ORDER PREFERABLE
• Data visualizations may be presented in a particular order or sequence.
• The order may be somewhat serendipitous only in terms of placement in a slideshow,
in a book, on a web page, and so on.
• The order may be purposeful to highlight some macro- or micro-level observation
about the data.
• No matter how the presentation order comes about, it helps to have an
underlying rationale or logical trajectory for the sequence.
• Even if viewers do not notice the organizational logic in the sequence, the learning is
made easier by having some order.
• This small section provides some ideas for the data visualization sequencing.
173

ORDER: SIMPLE-TO-COMPLEX
• Data visualizations may be presented in a simple-to-complex way, to bring
observers along with the flow of the data revelations.
• Simple pieces may be offered first to build up to a complex summary data
visualization, for example.
• Or, the data visualizations may begin with a complex visualization and then
offer more simple zoomed-in views of to offer more in-depth discussion and
insights.
174

ORDER: FEATURE-BASED,
GENERAL-TO-SPECIFIC
• Most datasets today are multi-dimensional and complex. One way to sequence
data visualizations is to focus on different aspects or features of the dataset.
• It may be helpful to create an over-arching structure of the dataset’s features
and use those to organize the data visualizations.
• This is the general-to-specific, top-down, and deductive approach.
• For example, if datasets involve a learning management system, would it be
helpful to organize the data visualizations by the data dictionary? The various
features of the LMS from most commonly used to the least commonly used?
The features by role (student, faculty, advisor, instructional designer, librarian,
and administrator)?
175

ORDER: FEATURE-BASED,
SPECIFIC-TO-GENERAL
• Another way is to start with the minutiae and details and broaden out.
• This is the specific-to-general, bottom-up, and inductive approach.
• This approach can be used to build interest and suspense…as to where the
details are leading.
• For example, in a study of social image sets, it is possible to code the imagery
to different categories first (in an emergent way, without a priori assumptions),
and then identify data patterns in the imagery…and then hypothesize from the
empirical data. The data visualizations can move from the details to the over-
arching macro structures in that sequential order.
176

ORDER: TIME-BASED
• Data visualizations may show a phenomenon changing over time.
• In this case, time is usually chronological.
• The changes may be a factor of time, a factor of an intervention or multiple
interventions, a factor of a process, or other factors.
• Time itself may be discrete, phased, or continuous.
• The time may be in sequential order, reverse-sequential order, or some mix of
phasing.
177

ORDER: SPATIAL,
ZOOMING IN- AND OUT-
• Data visualizations often contain complexity.
• Another organizational sequence may involve the following based on spatial
and scale views:
• Zooming-in to a data visualization for deeper micro understandings
• Zooming-out from a data visualization for deeper macro understandings
178

ORDER: AMOUNT OR INTENSITY
• The “most-to-least” (descending order) and “least-to-most” (ascending order)
approach enables a sense of substance.
• For example, the most frequent (mode) word in a text set may be introduced in a
data visualization, whether that word was identified in a word frequency count, a
topic model, or something other method.
• Then, data visualizations showing other highlighted terms in descending order may be
introduced.
• Then maybe insights from the long tail in the text corpora may be introduced.
• This whole theoretical sequence is in descending order, from most to least.
179

ORDER: THEORETICAL AND ACTUAL
• Another sequence may begin with a concept or model or some theoretical
conceptualization followed by empirical and actual data.
• This is the top-down approach, beginning with the general and moving to the specifics.
• Or, the sequence can go the other way, with observations from-world…and
moving to a more general data visualization.
• This is more of a bottom-up approach, beginning with specifics and moving to the
general.
180

CONTEXTUALIZING
DATA VISUALIZATIONS
181

A DATA VISUALIZATION “SURROUND”
• Data visualizations may be presented not only as stand-alone visualizations but
within a context or surround.
• A most close-in aspect of context may involve the data visualization directly.
• An important aspect of context involves the backstory behind the data
visualization.
• Where did the data come from? What sort of research was conducted in order to
capture the data? How was the data cleaned and processed?
• If datasets were mixed, where did the data come from? How were the datasets
mixed? Who should be credited?
• What are some qualifiers that need to be applied to the data visualizations?
182

SOME BENEFITS OF PROVIDING
“CONTEXT”
• If designed properly, a context for a data visualization achieves the following:
• enriches the data
• provides direction for proper interpretation of the data (highlights what “story” the data
are telling)
• suggests the relevance of the data in the real world
• raises interest about the data visualization(s)
• offers access to the underlying dataset
• provides ideas about where to acquire more relevant information about the related data
• gives credit where it is due for the data visualization, the dataset, the research, and other
related information, and others
183

ELEMENTS OF “CONTEXT”
• At a superficial level, data visualization “context” involves the lead-up and lead-away
text surrounding the data visualization.
• This may include stories to “set up” the phenomenon under study.
• This may include table data and downloadable datasets.
• This may include captioning, credits, research citations, and other details.
• This may include qualifiers.
• This may include lead-up multimedia (audio, video, and others) to prime learners to
understand the data visualization.
• There may be a lead-up or lead-away interview by the researchers or data analysts
or others related to the work.
184

“CONTEXT” BY ASSIGNMENT
• The learning situation offers some direction for the design of data visualization
context. Especially in a learning context, the instrumental uses of the data
visualization are important.
• The assignment should specify how learners should read / use the data
visualization or the data visualization sequence.
• For cognitive scaffolding, it may help to let learners know what to pay attention to in
the respective data visualizations. In a simple case, learners may only need to view the
data visualization and interpret what its meaning is.
• Some assignments can be broadly open-ended, with the data visualization(s) as
a jumping-off point for discussions, analyses, research, and other work.
185

DATA VISUALIZATIONS IN
ONLINE LEARNING CONTEXTS
• A slideshow
• A video
• A simulation
• A discussion board conversation
• A case for analysis
• A role play
• A group project
• A writing assignment
• A research assignment
• A field trip, and others
186

“CONTEXT” BY ISSUE
• Another method to build a surround around a data visualization or series of
data visualizations is by contextualizing these as part of an issue.
• An issue may be an in-world phenomenon, with its own history, evolution,
present, and future. There may be particular dynamics with this phenomenon
and certain levers and mechanisms that may affect the changes to this
phenomenon.
• The data visualization(s) may be presented to highlight aspects of this in-world
issue.
187

USER INTERACTIVITY
WITH THE DATA VISUALIZATIONS
188

WHY USER INTERACTIVITY?
• Data visualizations are not just static and flat files.
• Many enable various types of interactions:
• adjusting parameters of a model (such as data inputs and outputs);
• engaging time (speeding it up, slowing it down, stopping it);
• zooming in and out to disambiguate data, interrelationships, and other dimensions,
and
• accessing underlying data.
• Interactions with data visualizations may enable easier learning (with lower
cognitive loads) and the creation of insights.
189

DATA INPUTS AND OUTPUTS
• There are a number of data visualizations (built on NetLogo,Wolfram Language,
and others) that enable users to change up the parameters of the data
visualizations (including data) in order to see what will happen.
• Such data visualizations are focused on system effects of different parameters.
• Often, inputs may be emplaced with slider bars or forms.
• In some cases, it is important to design these with natural data limits (so as not to
enable going beyond reality). In other cases, such interactive data visualizations are
able to be informed by imaginary data ranges and others.
• Some of these data visualizations enable predictivity into the unknown and into the
future. (Agent-based models can be played out into imagination realms by enabling
hundreds of thousands of iterations or more, to see how systems change over time
given theoretical parameters.)
190

ENGAGING TIME
• Some data visualizations enable viewers to engage time…to start at particular
points of the 4D visualization, to pause, to restart, and so on. Data
visualizations may sometimes be slowed or speeded up.
• The phenomenon in such data visualizations include those that illuminate
systems and system effects.
191

ZOOMING IN- AND OUT-
TO DISAMBIGUATE
• Some data visualizations may be sufficiently complex that objects in data
visualizations may be occluded. To disambiguate complex data visualizations,
such as word networks or 3D cluster diagrams, many enable zooming in and
out to disambiguate the data.
• Many of these also enable the moving around of nodes and links in order to
enable clear visibility.
• Some enable zooming in to particular relationships and specific dynamics in the
data.
192

ACCESS TO UNDERLYING DATA
• Another type of interactivity with data visualizations involves viewers accessing
the underlying data behind the data visualization.
• For example, a text set which has been coded for sentiment may be explored
by clicking on a bar on a bar chart, to access the coded data under that
particular level of sentiment. Or a node representing an interview subject may
be clicked to access the underlying transcript.
• This type of interactivity enables the individual to explore the related data
more deeply.
193

ABOUT DATA VISUALIZATIONS
AND DECISION-MAKING
194

HUMAN DECISION-MAKING
• Data and data visualizations provide information about in-world phenomena
and in-world potentials.
• There are computational methods that enable the surfacing of latent patterns from
data that would be invisible otherwise.
• Data visualizations make latent insights visible and human-perceivable.
• Data dashboards often provide live and real-time data for awareness, decision-making,
and actions.
195

HUMAN DECISION-MAKING (CONT.)
• Ultimately, it is the data behind the data visualization that should inform the
decision-making.
• For data to reflect the world, it has to be properly collected.
• The targeted data have to provide “signal” (indicators of phenomena-of-interest) vs.
“noise” (non-informative static).
• It’s rare that one data visualization or even a sequence or a set will be
sufficiently informative or compelling to sway an important decision, but data
visualizations may be powerful depending on how they are created and
harnessed.
196

HUMAN DECISION-MAKING (CONT.)
• It is rare for all sources to point in one direction.
• If all data sources do sing from the same hymnbook, then it may be that the decision-
makers should have a broader data diet that allows a wider range of informational
sources to be accessed for varying perspectives.
197

ABOUT “BIG DATA”
AND DATA VISUALIZATIONS
198

“BIG(GISH) DATA”…
• Debated definition of “big(gish) data”:
• Millions of lines of records and numerous
columns of attribute values
• N = all (and “all” = everything available and in
whatever forms?)
• Structured (datasets and data tables) and semi-
structured / unstructured data (text, imagery,
audio, video, and others)
• Data may be dynamic (vs. static) and
analyzed in transit
• All the usual suspects in terms of data
visualizations plus
• Word clouds
• Cluster diagrams
• Network diagrams
• Mixed item data visualizations
• Dynamic data usually represented on data
dashboards, data crawls, and other fast-
changing formats
THE DATA VISUALIZATIONS
199
“BIG DATA” AND DATA VISUALIZATIONS

THE UNDERLYING DATA
AND DATA VISUALIZATIONS
• Data are raw, information is selective and processed, and data visualizations are
selective image-based summaries of data and information.
• This data may be descriptive, inferential, deductive, inductive, analytical, conceptual,
predictive, or some mix of the prior.
• Data visualizations may be sourced from a variety of data—some of it empirically
obtained and others from the human imagination
• Understanding the origins of the data and how it was harvested, created, processed,
handled, and represented is important to understanding data visualizations.
201

VARIOUS TYPES OF DATA
VISUALIZATIONS
• Historically, structured and semi-structured data have particular ways that they
are explored and visually expressed.
• Data visualizations have conventions that they must follow based on prior practice
and common understandings.
• Data visualizations may be in 2D, 3D, and 4D, as well as other dimensions.
• Data visualizations are not word-free zones.
• The words used as labels and descriptors have to be precise and align with the data
representations from the underlying dataset and the data visualization elements
themselves. Language matters.
202

VARIOUS TYPES OF DATA
• Data visualizations in online learning may be sequenced, contextualized, and
made-interactive to enhance learning.
• Data visualizations may be manually created, machine-drawn from data, or
some combination of the prior.
• Data visualizations—both static and dynamic—may be used to inform and
enhance human decision-making.
203

EFFECTIVENESS
• To be effective, data visualizations have to
• represent the underlying data accurately
• highlight relevant aspects of the data
• employ proper design
• follow basic data visualization conventions for the data and form, among others
• To align with the cognitive theory of multimedia learning, data visualizations
• should enhance learner perception and learning by employing strategies to lighten
cognitive load
204

STAND-ALONE
DATA VISUALIZATION CAPABILITIES
• Data visualizations are usually used in a learning or other context, but they are
often separated from their original contexts and must be able to be understood
even as a standalone chart, table, or figure.
• A data visualization, as a stand-alone, should not lead to misunderstandings (or
negative learning).
• Also, a stand-alone data visualization should be sufficiently professional-looking
because of “optics” and public reputations.
• With reverse image searches, if the original data visualization was found and
mapped by Web crawlers or spiders, it is possible that users may find their way
back to the original context of the data visualization’s usage (unless this content is
behind an authentication layer).
205

STAYING LEGAL
• Data visualizations should be based on solid research practices.
• All sources should be cited and given credit.
• Data should not be handled in any misleading way.
• Data visualizations should be created in legal ways. Relevant laws include
intellectual property, privacy protections, accessibility, and others.
206

HOW TO GET BETTER GOOD
• Train your eyes to see. Go looking for a range of data visualizations.
• Humans have to train into being more precise than normal. In normal states, people
tend to be pretty sloppy.
• Go elbow deep in data in all forms.
• Work on visualizing that data using different data visualizations and noting the
strengths and limits of each of the visualizations.
• Any changes to the underlying data mean updates to visualizations. Develop a sense
of when each data processing step is actually pseudo-complete before moving to the
next step (to avoid “make-work”).
207

HOW TO GET BETTER GOOD (CONT.)
• Avoid getting caught up in the dazzle of data visualizations.
• Do not leave unexploited raw data because the focus is on the visualizations.
• Read about how to create effective data visualizations.
• Get familiar with a range of software and analog tools for data visualization.
Put these into practice.
• Experiment broadly.
208

HOW TO GET BETTER GOOD (CONT.)
• Take on a range of masters (publishers, clients, supervisors, students, and
others) with different data visualization needs, and work hard to meet their
needs. Invite healthy and constructive critique, in order to continue to
improve.
• Communicate to the broader public(s) with data visualizations.
• Learn from how the users of data visualizations use them and what they say. If there
are repeated themes in responses, that may be something to pay attention to.
Sometimes, anomalous responses may spark insights.
• Give yourself time to improve (be patient) but not too much time (don’t be
lazy).
• Keep on working at getting better (than wherever you’re at), and aim to get good.
209

DATA VISUALIZATION “SIGNATURES”
ContributingVariables to
“Signatures”
% Influence on Signatures
Access to Data 20
Analyst Name Recognition 10
Applied Technologies 10
Data Handling Methods 10
Domain and Content Area(s) 10
Look and Feel 20
Research Impact 20
210

DESIRED TECHNOLOGICAL FEATURES
OF DATA VISUALIZATIONS
• Data visualizations should be designed optimally for the following technological
features:
• accessibility
• human readability
• usability across platforms and devices
• machine readability
• preservation (future-proofing across time)
213

ABOUT SOFTWARE TOOLS
• Software tools have to be used appropriately for accurate data visualizations.
• Software tools have differing strengths for data visualizations, so it helps to
know what the respective capabilities are and in what sequence and method
these may be applied for different effects.
• Predictive analytics tools have tests of models…
• Manual drawing tools have grids, guidelines, templates, pre-made shapes, and pullout
capabilities.
• Different software tools may be used for different capabilities in various
sequences. It is rare to just use one tool during the entire sequence of data
cleaning, visualization, polish, and finalization.
214

SOFTWARE USED IN THIS SLIDESHOW
• These data visualizations were created with various types of concepts and data,
data sources, seeding terms, data parameters, and software.
• The software used for data visualizations include the following (in alphabetical
order): Google Books NgramViewer, IBM’s SPSS Statistics, LIWC2015, Microsoft
Excel 2016, MSVisio, NetLogo, NodeXL template add-on to Excel (or Network
Overview, Discovery and Exploration for Excel by Microsoft and available on MS’s
CodePlex, which will be decommissioned by Dec. 2017, and will be available off
GitHub), NVivo 11 Plus (QSR International), Qualtrics, RapidMiner Studio,Tableau
Public, Streamgraph Add-on to MS Excel 2016 (Microsoft Research), and Google
Correlate. Backup software for digital data visualization processing include Gadwin
PrintScreen and Adobe Photoshop.
• Note: The presenter has no professional ties to any of the software makers
mentioned here.
215

CONTACT AND CONCLUSION
• Dr. Shalin Hai-Jew
• Instructional Designer
• iTAC
• Kansas State University
• 212 Hale / Farrell Library
• shalin@k-state.edu
• 785-532-5262
• Data Sources: In the few cases where
outside open data was used, the sources are
cited. Otherwise, all other data were
collected by the author, and the
visualizations were self-generated. General
research sources are cited via links.
• Thanks! I am grateful to the organizers of
the 4th Annual Big 12 Teaching & Learning
Conference at TexasTech University for
including this presentation in their lineup.
• Caveat: This presenter is working at
getting better at data visualizations and is a
long ways from “good” yet.
216

Creating Effective Data Visualizations for Online Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Creating Effective Data Visualizations for Online Learning

Similar to Creating Effective Data Visualizations for Online Learning (20)

More from Shalin Hai-Jew

More from Shalin Hai-Jew (20)

Recently uploaded

Recently uploaded (20)

Creating Effective Data Visualizations for Online Learning