Virtually every type of online learning involves some type of data visualization. Some common data visualizations include timelines, process diagrams, linegraphs, bar charts, pie charts, treemap diagrams, dendrograms, cluster diagrams, geographical maps, network graphs, word clouds, word networks, scatter diagrams, scatterplot matrices, intensity matrices, decision trees, and others. Indeed, there is also data in screenshots, photos, drawings, videos, or other types of visuals. Online dashboards contain rich data visualizations to convey dynamic data. Some data, such as big data, may only be conveyed in visuals for human understanding and interpretation; in raw form, the meaning is obscured and elusive. Data visualizations highlight salient aspects of data, and they have to be aligned for particular multi-uses: (1) user awareness and understanding, (2) data analytics, and (3) decision-making. This session defines some best practices for informative and engaging data visualizations for online learning. Original real-world examples are provided from modern software programs.
Creating Effective Data Visualizations in Excel 2016: Some BasicsShalin Hai-Jew
One of the mainstays of a modern software toolkit is Excel 2016, from Microsoft Office 2016. By reputation, Excel is considered a beginner’s tool that self-respecting data analysts would bypass, but Excel is fairly high-powered, can take up to 1.06 million rows of data per set, contains complex statistical analysis capabilities (without the need for scripting), and enables rich data visualizations. It has a number of rich add-ons to empower different analytical and data visualization functionalities. It works as a great bridging tool to more complex types of statistical analyses.
This session walks participants through some basic built-in data visualizations in Excel 2016, including pie charts and doughnuts, bar charts, tree maps and sunburst diagrams, cluster diagrams, spider (radar) charts, scattergraphs, and others. This session will cover how data structures and desired emphases will determine the options for particular data visualizations.
In this session, participants will
review how to load a data table,
read the general data in a data table (or worksheet),
process or clean the data as needed,
use the Recommended Charts feature,
decide which built-in data visualizations to use, and
consider how to add relevant data visualization elements (including data labels, background grids, axis labels, and titles) for a coherent and effective data visualization.
Also, participants will help co-build data visualizations from open-source and other datasets.
Using Decision Trees to Analyze Online Learning Data Shalin Hai-Jew
In machine learning, decision trees enable researchers to identify possible indicators (variables) that are important in predicting classifications, and these offer a sequence of nuanced groupings. For example, are there “tells” which would suggest that a particular student will achieve a particular grade in a course? Are there indicators that would identify learners who would select a particular field of study vs. another?
This session will introduce how decision trees are used to model data based on supervised machine learning (with labeled training set data) and how such models may be evaluated for accuracy with test data, with the open-source tool, RapidMiner Studio. Several related analytical data visualizations will be shared: 2D spatial maps, decision trees, and others. Attendees will also experience how 2x2 contingency tables work with Type 1 and Type 2 errors (and how the accuracy of the machine learning model may be assessed) to represent model accuracy, and the strengths and weaknesses of decision trees applied to some use cases from higher education. In this session, various examples of possible outcomes will be discussed and related pre-modeling theorizing (vs. post-hoc) about what may be seen in terms of particular variables. The basic data structure for running the decision tree algorithm will be described. If time allows, relevant parameters for a decision tree model will be discussed: criterion (gain_ratio, information_gain, gini_index, and accuracy), minimal size for split, minimal leaf size, minimal gain, maximal depth (based on the need for human readability of decision trees), confidence, and pre-pruning (and the desired level).
Formations & Deformations of Social Network GraphsShalin Hai-Jew
Social network graphs are node-link (vertex-edge; entity-relationship) diagrams that show relationships between people and groups. Open-source tools like NodeXL Basic (available on Microsoft’s CodePlex) enable the capture of network data from select social media platforms through third-party add-ons and social media APIs. From social groups, relational clusters are extracted with clustering algorithms which identify intensities of connections. Visually, structural relational data is conveyed with layout algorithms in two-dimensional space. Using these various layout options and built-in visual design features, it is possible to aesthetically “deform” the network graph data for visual effects. This presentation introduces novel datasets and novel data visualizations.
Using Large-Scale LMS Data Portal Data to Improve Teaching and Learning (at K...Shalin Hai-Jew
With any learning management system, a byproduct of its function is data, which may be analyzed to improve awareness, decision-making, and actions. At Kansas State University, its Canvas LMS instance recently made available its cumulative data from its first use in 2013. These flat files open a window to how the university is harnessing its LMS, with some macro-level insights that may suggest some areas to improve teaching and learning. This session describes some approaches to informatizing this empirical “big data” with some basic approaches: reviewing the data dictionary, extracting basic descriptions of the respective data sets, conducting time-based comparisons, surfacing testable hypotheses from data inferences, and conducting other data explorations. This introduces initial data analysis work only, but this does not preclude front-end analysis of courses at the micro level, relational database queries of the data, and other potential follow-on work.
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...Shalin Hai-Jew
The presenter will introduce the K-State LMS data portal and introduce some available insights from there and focus on one particular facet of this big data--the third-party apps that K-State faculty, admin, and staff have activated and what that says about how we're using Canvas.
Canvas LMS data portal for the Kansas State University instance
A data dictionary: Version 1.16.2 (https://portal.inshosteddata.com/docs)
Data extraction and processing
What it can tell us: (un)available data and information
Activated third-party tools in K-State Online Canvas LMS instance
Some caveats
What this says about what K-Staters (early adopters) are using
Practical applications of this third-party app activation data
Adding value to LMS data portal data
Leveraging Flat Files from the Canvas LMS Data Portal at K-StateShalin Hai-Jew
A lot of data are created in an LMS instance, and much of this can be analyzed for insight. In 2016, Instructure, the makers of Canvas, made their LMS data available to their customers through a data portal (updated monthly). This portal enables access to a number of flat files related to that particular instance. This presentation showcases how this big data was analyzed on a regular laptop with basic office software, to summarize Kansas State University’s use of the LMS. Methods for analysis include the following: basic descriptive statistics, survival analysis, computational linguistic analysis, and others.
The results are reported out with both numbers and data visualizations, including classic pie charts, line graphs, bar charts, mixed-charts, word clouds, and others. The findings provide some insights about how to approach the data, how to use a data dictionary, and other methods for extracting the data for awareness and practical decision-making. This work also is suggestive of next steps for more advanced analysis (using the flat files in a SQL database).
More information about this may be accessed at http://scalar.usc.edu/works/c2c-digital-magazine-spring--summer-2017/wrangling-big-data-in-a-small-tech-ecosystem.
Creating Effective Data Visualizations in Excel 2016: Some BasicsShalin Hai-Jew
One of the mainstays of a modern software toolkit is Excel 2016, from Microsoft Office 2016. By reputation, Excel is considered a beginner’s tool that self-respecting data analysts would bypass, but Excel is fairly high-powered, can take up to 1.06 million rows of data per set, contains complex statistical analysis capabilities (without the need for scripting), and enables rich data visualizations. It has a number of rich add-ons to empower different analytical and data visualization functionalities. It works as a great bridging tool to more complex types of statistical analyses.
This session walks participants through some basic built-in data visualizations in Excel 2016, including pie charts and doughnuts, bar charts, tree maps and sunburst diagrams, cluster diagrams, spider (radar) charts, scattergraphs, and others. This session will cover how data structures and desired emphases will determine the options for particular data visualizations.
In this session, participants will
review how to load a data table,
read the general data in a data table (or worksheet),
process or clean the data as needed,
use the Recommended Charts feature,
decide which built-in data visualizations to use, and
consider how to add relevant data visualization elements (including data labels, background grids, axis labels, and titles) for a coherent and effective data visualization.
Also, participants will help co-build data visualizations from open-source and other datasets.
Using Decision Trees to Analyze Online Learning Data Shalin Hai-Jew
In machine learning, decision trees enable researchers to identify possible indicators (variables) that are important in predicting classifications, and these offer a sequence of nuanced groupings. For example, are there “tells” which would suggest that a particular student will achieve a particular grade in a course? Are there indicators that would identify learners who would select a particular field of study vs. another?
This session will introduce how decision trees are used to model data based on supervised machine learning (with labeled training set data) and how such models may be evaluated for accuracy with test data, with the open-source tool, RapidMiner Studio. Several related analytical data visualizations will be shared: 2D spatial maps, decision trees, and others. Attendees will also experience how 2x2 contingency tables work with Type 1 and Type 2 errors (and how the accuracy of the machine learning model may be assessed) to represent model accuracy, and the strengths and weaknesses of decision trees applied to some use cases from higher education. In this session, various examples of possible outcomes will be discussed and related pre-modeling theorizing (vs. post-hoc) about what may be seen in terms of particular variables. The basic data structure for running the decision tree algorithm will be described. If time allows, relevant parameters for a decision tree model will be discussed: criterion (gain_ratio, information_gain, gini_index, and accuracy), minimal size for split, minimal leaf size, minimal gain, maximal depth (based on the need for human readability of decision trees), confidence, and pre-pruning (and the desired level).
Formations & Deformations of Social Network GraphsShalin Hai-Jew
Social network graphs are node-link (vertex-edge; entity-relationship) diagrams that show relationships between people and groups. Open-source tools like NodeXL Basic (available on Microsoft’s CodePlex) enable the capture of network data from select social media platforms through third-party add-ons and social media APIs. From social groups, relational clusters are extracted with clustering algorithms which identify intensities of connections. Visually, structural relational data is conveyed with layout algorithms in two-dimensional space. Using these various layout options and built-in visual design features, it is possible to aesthetically “deform” the network graph data for visual effects. This presentation introduces novel datasets and novel data visualizations.
Using Large-Scale LMS Data Portal Data to Improve Teaching and Learning (at K...Shalin Hai-Jew
With any learning management system, a byproduct of its function is data, which may be analyzed to improve awareness, decision-making, and actions. At Kansas State University, its Canvas LMS instance recently made available its cumulative data from its first use in 2013. These flat files open a window to how the university is harnessing its LMS, with some macro-level insights that may suggest some areas to improve teaching and learning. This session describes some approaches to informatizing this empirical “big data” with some basic approaches: reviewing the data dictionary, extracting basic descriptions of the respective data sets, conducting time-based comparisons, surfacing testable hypotheses from data inferences, and conducting other data explorations. This introduces initial data analysis work only, but this does not preclude front-end analysis of courses at the micro level, relational database queries of the data, and other potential follow-on work.
The K-State Online Canvas LMS Data Portal and Five Years of Activated Third-P...Shalin Hai-Jew
The presenter will introduce the K-State LMS data portal and introduce some available insights from there and focus on one particular facet of this big data--the third-party apps that K-State faculty, admin, and staff have activated and what that says about how we're using Canvas.
Canvas LMS data portal for the Kansas State University instance
A data dictionary: Version 1.16.2 (https://portal.inshosteddata.com/docs)
Data extraction and processing
What it can tell us: (un)available data and information
Activated third-party tools in K-State Online Canvas LMS instance
Some caveats
What this says about what K-Staters (early adopters) are using
Practical applications of this third-party app activation data
Adding value to LMS data portal data
Leveraging Flat Files from the Canvas LMS Data Portal at K-StateShalin Hai-Jew
A lot of data are created in an LMS instance, and much of this can be analyzed for insight. In 2016, Instructure, the makers of Canvas, made their LMS data available to their customers through a data portal (updated monthly). This portal enables access to a number of flat files related to that particular instance. This presentation showcases how this big data was analyzed on a regular laptop with basic office software, to summarize Kansas State University’s use of the LMS. Methods for analysis include the following: basic descriptive statistics, survival analysis, computational linguistic analysis, and others.
The results are reported out with both numbers and data visualizations, including classic pie charts, line graphs, bar charts, mixed-charts, word clouds, and others. The findings provide some insights about how to approach the data, how to use a data dictionary, and other methods for extracting the data for awareness and practical decision-making. This work also is suggestive of next steps for more advanced analysis (using the flat files in a SQL database).
More information about this may be accessed at http://scalar.usc.edu/works/c2c-digital-magazine-spring--summer-2017/wrangling-big-data-in-a-small-tech-ecosystem.
Using the Qualtrics Research Suite as a Training LMS Shalin Hai-Jew
While learning management systems (LMSes) are a particular technology class, plenty of other technologies have been harnessed and retrofitted for learning purposes. Alternate LMSes include social media platforms like Twitter and survey systems. This presentation summarizes the uses of an online research (survey) suite, Qualtrics, for large-scale compliance-based trainings at Kansas State University. Some affordances of Qualtrics as a training LMS include the following:
a multimedia approach (including mobile),
a file upload approach,
logic functions,
branching logic for versioning a training (based on profiles or behaviors),
a scoring feature for grading performance,
logic tools to set standards for pass/fail,
integration with Google Translate,
security features,
built-in data analytics tools,
a dashboard for versioning different reports for different data clients,
an API for recording training completions and performances in information systems,
easy data export for external analytics,
and unique “skins”.
Capitalizing on Machine Reading to Engage Bigger DataShalin Hai-Jew
What are some ways to select, say, 200 research articles to “close read” from a set of 2,000 PDF articles gleaned from library databases and Google Scholar? How can a researcher make sense of a trending issue in the flood of Tweets and RT based on a particular hashtag (#) or keyword search or an especially lively Tweetstream based on a particular social media account? People are dealing with ever more prodigious amounts of information—from a number of sources. Those who are savvy to the uses of computers to aid their reading (through “distant reading” or “not-reading”) may find that they are able to cover much more ground. This presentation introduces the use of NVivo 11 Plus (matrix queries, word frequency counts, text searches and dendrograms, cluster analyses, topic modeling, and others) for multiple cases of distant reading to aid in academic and research work.
Semantic Interoperability & Information Brokering in Global Information SystemsAmit Sheth
Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote talk at IEEE-Metadata Conference, Bethesda, MD, USA, April 6, 1999.
Key coverage:
Use of ontologies for semantic interoperability (http://knoesis.org/library/resource.php?id=00277); InfoHarness (http://knoesis.org/library/resource.php?id=00275) and VisualHarness (http://knoesis.org/library/resource.php?id=00267) demonstrate faceted search; MREF - putting metadata on HREF is way ahead of its time (see: http://knoesis.org/library/resource.php?id=00294); multi-ontology query processing in OBSERVER system (http://knoesis.org/library/resource.php?id=00273)
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...Amit Sheth
Amit Sheth, Keynote: International Conference on Interoperating Geographic Systems (Interop’97), Santa Barbara, December 3-4 1997.
Related technical paper: http://knoesis.org/library/resource.php?id=00230
When digital learning objects (DLOs) were initially conceptualized, based on object-oriented programming, there were initial high hopes that people could build learning objects that were re-usable by others. DLOs have come a long way in the past few decades, and many are available for free on various repositories, referatories, digital libraries, and other sources. In a recent research project, the presenter explored what features of DLOs make them adoptable for online learning and created a ten-element model for DLO adoption. The reality is that adoption of DLOs is not cost-free and not effort-free. The ten elements include the following categories:
Pedagogical Value
Learner Engagement
Presentational Features
Legal Considerations
Technological Features
Instructor (Adopter) Control
Applicability to the Respective Learning Contexts (Local Conditions)
Local Costs to Deploy
Labeling and Documentation, Contributor and Informational Source Crediting
Global Transferability and Adoptability
She then analyzed her decades of work in instructional design in higher education (and private industry) to see what features were addressed in the respective funded DLOs. She found discrepancies between what makes DLOs adoptable and what is built and suggests some practical ways to close those gaps with techniques and technologies, in order to further support and propel the “digital learning object economy”.
Knowledge maps for e-learning. Jae Hwa Lee, Aviv Segev
Maps such as concept maps and knowledge maps are often used as learning materials. These maps havenodes and links, nodes as key concepts and links as relationships between key concepts. From a map, theuser can recognize the important concepts and the relationships between them. To build concept orknowledge maps, domain experts are needed. Therefore, since these experts are hard to obtain, the costof map creation is high. In this study, an attempt was made to automatically build a domain knowledgemap for e-learning using text mining techniques. From a set of documents about a specific topic,keywords are extracted using the TF/IDF algorithm. A domain knowledge map (K-map) is based onranking pairs of keywords according to the number of appearances in a sentence and the number ofwords in a sentence. The experiments analyzed the number of relations required to identify theimportant ideas in the text. In addition, the experiments compared K-map learning to document learningand found that K-map identifies the more important ideas
"Mass Surveillance" through Distant ReadingShalin Hai-Jew
Distant reading refers to the uses of computers to “read” texts by counting words, identifying themes and subthemes (through topic modeling), extracting sentiment, applying psychological analysis to the author(s), and otherwise finding latent or hidden insights. This work is based on research on “mass surveillance” based on five text sets: academic, mainstream journalism, microblogging, Wikipedia articles, and leaked government data. The purpose was to capture some insights about the collective social discussions occurring around this issue in an indirect way. This presentation uses a variety of data visualizations (article network graphs, word trees, dendrograms, treemaps, cluster diagrams, line graphs, bar charts, pie charts, and others) to show how machines read and the types of summary data they enable (at computational speeds, at machine scale, and in a reproducible way). Also, some computational linguistic analysis tools enable the creation of custom dictionaries for unique types of applied research. The tools used in this presentation include NVivo 11 Plus and LIWC2015.
Making the Most of the New File Upload Question Feature in an LMS: Nine Appl...Shalin Hai-Jew
In Canvas and Qualtrics, a recent new feature enables learners (or survey participants) to upload digital files. While these have varying limits—of file sizes, of file types, of file handling, identification or anonymization of file uploaders, and the level of sharing of uploaded files—it is useful to think of assignment possibilities in order to maximize this feature. This presentation provides some preliminary instructional design for how to build effective assignments using the file upload feature. This session also involves considerations like intellectual property, privacy rights, and proper handling of digital contents by learners and instructors. There are also considerations for data security and protections.
Finally, there are discussions about memory limits for file uploads (within an online course), as well as digital preservation (whether the uploaded files are temporary and transient or semi-permanent or permanent, for learning purposes).
Matrix Queries and Matrix Data Representations in NVivo 11 PlusShalin Hai-Jew
This slideshow, "Matrix Queries and Matrix Data Representations in NVivo 11 Plus," covers the following points:
Matrices and their basic structures
Types of elements (variables) for matrix comparisons
Setting up matrix queries in NVivo 11
Specific matrix “use cases” in qualitative and mixed methods research
Wrap-up
Semantic Interoperability and Information Brokering in Global Information Sys...Amit Sheth
Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote talk at IEEE-Metadata Conference, Bethesda, MD, USA, April 6, 1999.
Using Qualtrics to Create Automated Online TrainingsShalin Hai-Jew
When thinking about “transformational teaching and learning,” training would not be the first thing to come to mind.
The Qualtrics® research suite offers a number of design tools and features that enable the building of automated online trainings. There are the baseline features such as the ability to integrate multimedia, apply various question designs, enable accessibility features (like alt-texting), deliver a mobile experience, reach learners across distances, and provide basic security and data integrity features.
Other features actually make this tool phenomenally powerful. One is the ability to richly customize learning sequences—by learner profile, by performance (behavior), by selection, or a mix of factors. There is a feature that enables the scoring of learner responses and the ability to set a threshold for passing. This tool has a rich data analytics capability (including a light item analysis), including online analytics and even cross-tabulation analysis. A Qualtrics® API enables the recording of online assessment scores and learner behaviors, in an automated way to faculty / staff / student information systems.
Trainings are critical for effective workplace functioning and professional development. The same features in Qualtrics® that enable the effective building of automated trainings also enable the effective building of pre-learning modules or sequences for learners who need to refresh their skills for a new course. This digital slideshow introduces the use of Qualtrics® as a customizable training and pre-learning module tool.
Building a Digital Learning Object w/ Articulate Storyline 2Shalin Hai-Jew
The digital learning object (DLO) is still a common staple in online learning. One of the more sophisticated authoring tools to build DLOs is Articulate Storyline 2, which enables the integration of multimedia (including screen captures with Articulate Replay); the building of animations; branching, and other features. Its packaging allows a full range of SCORM and Tin Can API outputs and versioning in HTML 5. This presentation will introduce the software tool and some of its capabilities to provide a sense of where digital learning objects may be headed.
Using the Qualtrics Research Suite as a Training LMS Shalin Hai-Jew
While learning management systems (LMSes) are a particular technology class, plenty of other technologies have been harnessed and retrofitted for learning purposes. Alternate LMSes include social media platforms like Twitter and survey systems. This presentation summarizes the uses of an online research (survey) suite, Qualtrics, for large-scale compliance-based trainings at Kansas State University. Some affordances of Qualtrics as a training LMS include the following:
a multimedia approach (including mobile),
a file upload approach,
logic functions,
branching logic for versioning a training (based on profiles or behaviors),
a scoring feature for grading performance,
logic tools to set standards for pass/fail,
integration with Google Translate,
security features,
built-in data analytics tools,
a dashboard for versioning different reports for different data clients,
an API for recording training completions and performances in information systems,
easy data export for external analytics,
and unique “skins”.
Capitalizing on Machine Reading to Engage Bigger DataShalin Hai-Jew
What are some ways to select, say, 200 research articles to “close read” from a set of 2,000 PDF articles gleaned from library databases and Google Scholar? How can a researcher make sense of a trending issue in the flood of Tweets and RT based on a particular hashtag (#) or keyword search or an especially lively Tweetstream based on a particular social media account? People are dealing with ever more prodigious amounts of information—from a number of sources. Those who are savvy to the uses of computers to aid their reading (through “distant reading” or “not-reading”) may find that they are able to cover much more ground. This presentation introduces the use of NVivo 11 Plus (matrix queries, word frequency counts, text searches and dendrograms, cluster analyses, topic modeling, and others) for multiple cases of distant reading to aid in academic and research work.
Semantic Interoperability & Information Brokering in Global Information SystemsAmit Sheth
Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote talk at IEEE-Metadata Conference, Bethesda, MD, USA, April 6, 1999.
Key coverage:
Use of ontologies for semantic interoperability (http://knoesis.org/library/resource.php?id=00277); InfoHarness (http://knoesis.org/library/resource.php?id=00275) and VisualHarness (http://knoesis.org/library/resource.php?id=00267) demonstrate faceted search; MREF - putting metadata on HREF is way ahead of its time (see: http://knoesis.org/library/resource.php?id=00294); multi-ontology query processing in OBSERVER system (http://knoesis.org/library/resource.php?id=00273)
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...Amit Sheth
Amit Sheth, Keynote: International Conference on Interoperating Geographic Systems (Interop’97), Santa Barbara, December 3-4 1997.
Related technical paper: http://knoesis.org/library/resource.php?id=00230
When digital learning objects (DLOs) were initially conceptualized, based on object-oriented programming, there were initial high hopes that people could build learning objects that were re-usable by others. DLOs have come a long way in the past few decades, and many are available for free on various repositories, referatories, digital libraries, and other sources. In a recent research project, the presenter explored what features of DLOs make them adoptable for online learning and created a ten-element model for DLO adoption. The reality is that adoption of DLOs is not cost-free and not effort-free. The ten elements include the following categories:
Pedagogical Value
Learner Engagement
Presentational Features
Legal Considerations
Technological Features
Instructor (Adopter) Control
Applicability to the Respective Learning Contexts (Local Conditions)
Local Costs to Deploy
Labeling and Documentation, Contributor and Informational Source Crediting
Global Transferability and Adoptability
She then analyzed her decades of work in instructional design in higher education (and private industry) to see what features were addressed in the respective funded DLOs. She found discrepancies between what makes DLOs adoptable and what is built and suggests some practical ways to close those gaps with techniques and technologies, in order to further support and propel the “digital learning object economy”.
Knowledge maps for e-learning. Jae Hwa Lee, Aviv Segev
Maps such as concept maps and knowledge maps are often used as learning materials. These maps havenodes and links, nodes as key concepts and links as relationships between key concepts. From a map, theuser can recognize the important concepts and the relationships between them. To build concept orknowledge maps, domain experts are needed. Therefore, since these experts are hard to obtain, the costof map creation is high. In this study, an attempt was made to automatically build a domain knowledgemap for e-learning using text mining techniques. From a set of documents about a specific topic,keywords are extracted using the TF/IDF algorithm. A domain knowledge map (K-map) is based onranking pairs of keywords according to the number of appearances in a sentence and the number ofwords in a sentence. The experiments analyzed the number of relations required to identify theimportant ideas in the text. In addition, the experiments compared K-map learning to document learningand found that K-map identifies the more important ideas
"Mass Surveillance" through Distant ReadingShalin Hai-Jew
Distant reading refers to the uses of computers to “read” texts by counting words, identifying themes and subthemes (through topic modeling), extracting sentiment, applying psychological analysis to the author(s), and otherwise finding latent or hidden insights. This work is based on research on “mass surveillance” based on five text sets: academic, mainstream journalism, microblogging, Wikipedia articles, and leaked government data. The purpose was to capture some insights about the collective social discussions occurring around this issue in an indirect way. This presentation uses a variety of data visualizations (article network graphs, word trees, dendrograms, treemaps, cluster diagrams, line graphs, bar charts, pie charts, and others) to show how machines read and the types of summary data they enable (at computational speeds, at machine scale, and in a reproducible way). Also, some computational linguistic analysis tools enable the creation of custom dictionaries for unique types of applied research. The tools used in this presentation include NVivo 11 Plus and LIWC2015.
Making the Most of the New File Upload Question Feature in an LMS: Nine Appl...Shalin Hai-Jew
In Canvas and Qualtrics, a recent new feature enables learners (or survey participants) to upload digital files. While these have varying limits—of file sizes, of file types, of file handling, identification or anonymization of file uploaders, and the level of sharing of uploaded files—it is useful to think of assignment possibilities in order to maximize this feature. This presentation provides some preliminary instructional design for how to build effective assignments using the file upload feature. This session also involves considerations like intellectual property, privacy rights, and proper handling of digital contents by learners and instructors. There are also considerations for data security and protections.
Finally, there are discussions about memory limits for file uploads (within an online course), as well as digital preservation (whether the uploaded files are temporary and transient or semi-permanent or permanent, for learning purposes).
Matrix Queries and Matrix Data Representations in NVivo 11 PlusShalin Hai-Jew
This slideshow, "Matrix Queries and Matrix Data Representations in NVivo 11 Plus," covers the following points:
Matrices and their basic structures
Types of elements (variables) for matrix comparisons
Setting up matrix queries in NVivo 11
Specific matrix “use cases” in qualitative and mixed methods research
Wrap-up
Semantic Interoperability and Information Brokering in Global Information Sys...Amit Sheth
Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote talk at IEEE-Metadata Conference, Bethesda, MD, USA, April 6, 1999.
Using Qualtrics to Create Automated Online TrainingsShalin Hai-Jew
When thinking about “transformational teaching and learning,” training would not be the first thing to come to mind.
The Qualtrics® research suite offers a number of design tools and features that enable the building of automated online trainings. There are the baseline features such as the ability to integrate multimedia, apply various question designs, enable accessibility features (like alt-texting), deliver a mobile experience, reach learners across distances, and provide basic security and data integrity features.
Other features actually make this tool phenomenally powerful. One is the ability to richly customize learning sequences—by learner profile, by performance (behavior), by selection, or a mix of factors. There is a feature that enables the scoring of learner responses and the ability to set a threshold for passing. This tool has a rich data analytics capability (including a light item analysis), including online analytics and even cross-tabulation analysis. A Qualtrics® API enables the recording of online assessment scores and learner behaviors, in an automated way to faculty / staff / student information systems.
Trainings are critical for effective workplace functioning and professional development. The same features in Qualtrics® that enable the effective building of automated trainings also enable the effective building of pre-learning modules or sequences for learners who need to refresh their skills for a new course. This digital slideshow introduces the use of Qualtrics® as a customizable training and pre-learning module tool.
Building a Digital Learning Object w/ Articulate Storyline 2Shalin Hai-Jew
The digital learning object (DLO) is still a common staple in online learning. One of the more sophisticated authoring tools to build DLOs is Articulate Storyline 2, which enables the integration of multimedia (including screen captures with Articulate Replay); the building of animations; branching, and other features. Its packaging allows a full range of SCORM and Tin Can API outputs and versioning in HTML 5. This presentation will introduce the software tool and some of its capabilities to provide a sense of where digital learning objects may be headed.
INDIAN STATISTICAL INSTITUTE
Documentation Research & Training Centre
8th Mile, Mysore Road, RVCE Post
Bangalore-560 059
DRTC Seminar- 5
2014
Data Literacy
ABSTRACT
In our increasingly data-driven society, data literacy is an important civic skill which we should be developing in our society. Data is slowly but steadily forcing their way into the societies. Data literacy may seem less technical than either Computer Science or any other fields. Still we need to envisage a wide variety of tools for accessing, converting and manipulating data. These require to understand relational databases (like MS Access), data manipulation techniques, statistical software tools (like Minitab, SPSS, STATA and MS Excel) and data representation software tools (like MS PowerPoint and MS Excel). This seminar includes an introduction on data literacy, its inter-relationship with information literacy and statistical literacy. It also includes various steps for working with data followed by short demonstration of data analysis techniques by using the software STATA11.
Speaker: Jayanta Kr. Nayek
Date:29 .10.2014. Time: 2 p.m.
Venue: DRTC, ISI Bangalore.
All are cordially invited.
Seminar Coordinator
Biswanath Dutta
Data visualization in data science: exploratory EDA, explanatory. Anscobe's quartet, design principles, visual encoding, design engineering and journalism, choosing the right graph, narrative structures, technology and tools.
Building Surveys in Qualtrics for Efficient AnalyticsShalin Hai-Jew
Qualtrics® is a state-of-the-art online research suite which enables sophisticated data collection and analytics. This presentation will describe how to build a survey for efficient analytics, both within Qualtrics® and outside Qualtrics®. This presentation emphasizes the importance of thinking through the data collection, the analytics, and the data presentation, in order to build a survey instrument that works for the research context. Along the way, some of the cutting-edge survey-building capabilities of Qualtrics® (including rich question types, invisible questions, branching logic, display logic, panel triggers, and others), will be showcased along with the data analytics functionalities (including cross-tab analysis and data visualizations).
A walk through the maze of understanding Data Visualization using several tools such as Python, R, Knime and Google Data Studio.
This workshop is hands-on and this set of presentations is designed to be an agenda to the workshop
Prerequisies of DBMS
Course Objectives of DBMS
Syllabus
What is the meaning of data and database
DBMS
History of DBMS
Different Databases available in Market
Storage areas
Why to Learn DBMS?
Peoples who work with Databases
Applications of DBMS
Similar to Creating Effective Data Visualizations for Online Learning (20)
Long nonfiction chapters are not in-style and may never have been. Where average chapter lengths of nonfiction book chapters are about 4,000 – 7,000 words in length, some may be several times that max range number. The explanation is that there is some irreducible complexity that that chapter addresses that cannot be addressed in shorter form. This slideshow explores some methods for writing longer chapters while still maintaining coherence, focus, and reader interest…and while using some technological tools to write and edit more efficiently.
Overcoming Reluctance to Pursuing Grant Funds in AcademiaShalin Hai-Jew
Starting as an organization’s new grant writer can be a challenge, especially in a case where there has been a time lapse since the last one left. People get out of the habit of pursuing grant funds. This slideshow addresses some of the reasons for such reluctance and proposes some ways to mitigate these.
Writing grants is one common way that those in institutions of higher education may acquire some funds—small and big, one-off and continuing—to conduct research, hire faculty and researchers and learners and others, update equipment, update or build up new buildings, and achieve other work. This slideshow explores some aspects of the work of grant writing in the present moment in higher education.
Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...Shalin Hai-Jew
The SARS-CoV-2 pandemic inspired several years of experimentation with common or folk art, involving mixed media, alcohol ink painting, and other explorations. Then, with the emergence of art-making generative AIs, there were further experiments, particularly with one that enables generation of visuals from scanned art and photos, text prompts, style overlays, and text-based visual modifiers. While both types of artmaking are emotionally satisfying and helpful for stress management, there are some contrasting differences. This exploratory slideshow explores some of these differences in order to partially shed light on the informal usage of an art-making generative AI (artificial intelligence).
Creating Seeding Visuals to Prompt Art-Making Generative AIsShalin Hai-Jew
Art-making generative AIs have come to the fore. A basic work pipeline typically involves starting with text prompts -> generated images. That image may be used to seed further iterations. Deep Dream Generator (DDG) enables the application of “modifiers” of various types (artist styles, visual adjectives, others) to be applied in addition to the text prompt.
Another approach involves beginning with a “seeding image,” a born-digital or digitized (born-analog) visual on which AI-generated art may be based for a multi-channel and multi-modal prompt. This slideshow provides some observations of how to think about seeding images, particularly in terms of how the DDG handles them, with its “algorithmic pareidolia” (“Deep Dream,” Wikipedia, July 3, 2023).
Human art-making is often about throwing mass-scale conversations. Artists are thought to help bridge humanity into the future. Whether generative AI art enables this or not is still not clear.
Common Neophyte Academic Book Manuscript Reviewer MistakesShalin Hai-Jew
The work of academic book reviewing, as a volunteer (most often), is a common academic practice. The presenter has served as a neophyte one for some years before settling into this invited volunteer work for several decades. There have been lessons learned over time about avoidable mistakes…from both experience and observation.
Fashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AIShalin Hai-Jew
CrAIyon (formerly DALL-E after Salvador “Dali”) is a web-facing art-making generative AI tool online (https://www.craiyon.com/) that enables the uses of text (and image) prompts for the creation of watermarked, lightweight visuals. Counterintuitively, the rough visuals are much more usable for recombinations and remixes and recreations into usable digital visuals for various digital learning objects. The textual prompts are not particularly intuitive because of how the generative AI program was trained on mass-scale visuals). There is an art and occasional indirection to working prompts after each try, with the resulting nine-image proof sheets that CrAIyon outputs. The tool can be used iteratively for different outputs.
The tool sometimes turns out serendipitous surprises, including an occasional work so refined that it can be used / shared almost unedited. One challenge in using CrAIyon comes from their request for credit (for all non-subscribers to their service). Another comes from the visual watermarking (orange crayon at the bottom right of the image). However, this tool is quite useful for practical applications if one is willing to engage deep digital image editing (Adobe Photoshop, Adobe Illustrator).
Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...Shalin Hai-Jew
Augmented reality (AR)—the use of digital overlays over physical space—manifests in a wide range of spaces (indoor, outdoor; virtual) and ways (in real space (with unaided human vision); in head gear; in smart glasses; on mobile devices, and others). There are various authoring technologies that enable the making of AR experiences for various users. This work uses a particular tool (Adobe Aero®) to explore ways to build AR for multiple dimensions, including the fourth dimension (motion, changes over time).
Based on the respective purposes of the AR experience, some basic heuristics are captured for
space design (1),
motion design (2),
multiple perception design (sight, smell, taste, sound, touch) (3),
and virtual- and tangible- interactivity (4).
Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...Shalin Hai-Jew
One of the extant questions about augmented reality (AR) is how (in)effective it is for the teaching and learning in various formal, nonformal, and informal contexts. The research literature shows mixed findings, which are often highly context-based (and not generalizable). There are some non-trivial costs to the design/development/deployment of AR for teaching and learning. For the users, there is cognitive load on the working memory [(1) extraneous/poor design, (2) intrinsic/inherent difficulty in topic, and (3) germane/forming schemas]. For teachers, there are additional knowledge, skills, and abilities / attitudes (KSAs) that need to be brought to bear.
Exploring the Deep Dream Generator (an Art-Making Generative AI) Shalin Hai-Jew
The Deep Dream Generator was created by Google engineer Alexander Mordvintsev in 2014. It has a public facing instance at https://deepdreamgenerator.com/, which enables people to use text prompts and image prompts (individually or in combination) to inspire the art-generating generative AI to output images. This work highlights some process-based walk-throughs of the tool, some practical uses, some lightweight art learning, some aspects of the online social community on this platform, and other insights. Some works by the AI prompted by the presenter may be seen here: https://deepdreamgenerator.com/u/sjjalinn.
(This is the first draft of a slideshow that will be used in a conference later in the year.)
Augmented Reality for Learning and AccessibilityShalin Hai-Jew
Recently, the presenter conducted a systematic review of the academic literature and an environmental scan to learn how to set up an augmented reality (AR) shop at an institution of higher education. The ambition was to not only set up AR in an accessible and legal way but also be able to test for potential +/- effects of AR on teaching and learning. The research did not go past the review stage, because of a lack of funding, but some insights about accessibility in AR were acquired.
(The visuals are from Deep Dream Generator and CrAIyon.)
Engaging Pixabay as an open-source contributor to hone digital image editing,...Shalin Hai-Jew
This slideshow describes the author's early experiences with creating two accounts on Pixabay in order to advance digital editing skills in multimedia. The two accounts are located at https://pixabay.com/users/sjjalinn-28605710/ and https://pixabay.com/users/wavegenerics-29440244/ ...
This work explores four main spaces where researchers publish about educational technology: academic-commercial, open-access, open-source, and self-publishing.
Human-Machine Collaboration: Using art-making AI (CrAIyon) as cited work, o...Shalin Hai-Jew
It is early days for generative art AIs. What are some ways to use these to complement one's work while staying legal (legal-ish)?
Correction: .webp is a raster format
Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...Shalin Hai-Jew
University creative shops are exploring whether they can get into the game of producing AR-enhanced experiences: campus tours, interactive gaming, virtual laboratories, exploratory art spaces, simulations, design labs, online / offline / blended teaching and learning modules, and other AR applications.
This work offers a basic environmental scan of the AR space for online teaching and learning, and it includes pedagogical design leads from the current research, technological knowhow, hands-on design / development / deployment of learning objects, and online teaching and learning methods.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. PRESENTATION DESCRIPTION
• Virtually every type of online learning involves some type of data visualization.
Some common data visualizations include timelines, process diagrams,
linegraphs, bar charts, pie charts, treemap diagrams, dendrograms, cluster
diagrams, geographical maps, network graphs, word clouds, word networks,
scatter diagrams, scatterplot matrices, intensity matrices, decision trees, and
others. Indeed, there is also data in screenshots, photos, drawings, videos, or
other types of visuals. Online dashboards contain rich data visualizations to
convey dynamic data. Some data, such as big data, may only be conveyed in
visuals for human understanding and interpretation; in raw form, the meaning is
obscured and elusive. Data visualizations highlight salient aspects of data, and
they have to be aligned for particular multi-uses: (1) user awareness and
understanding, (2) data analytics, and (3) decision-making.
2
3. PRESENTATION DESCRIPTION
(CONT.)
• This session defines some best practices for informative and engaging data
visualizations for online learning. Original real-world examples are provided
from modern software programs.
3
6. OVERVIEW
• Oversimplifications about Data,
Information, and DataVisualizations
• DataVisualization Sampler (and Audience
Interpretations)
• Data asVisualization
• DataVisualizations in Online Learning
• Defining “Effective” DataVisualizations
• HumanVisual Perception
• CognitiveTheory of Multimedia Learning
• Steps to Creating DataVisualizations
• Conventions of DataVisualizations
• 2D
• 3D
• 4D
6
7. OVERVIEW (CONT.)
• Sequencing DataVisualizations
• Contextualizing DataVisualizations
• User Interactivity with the Data
Visualizations
• About DataVisualizations and Decision-
making
• About “Big Data” and DataVisualizations
• Some Quick Takeaways
• A Note about the Software
7
9. ABOUT DATA
• Anything that contains raw information
• May be structured (labeled data, such as in data tables)
• May be unstructured or semi-structured (such as imagery, text, audio, video, mixed
media, and other non-traditional contents that contain informational value /
extractable meaning)
• Structured data tend to follow the basics of row data as individual records and
column data as attributes or variables
• Matrices may contain similar attributes in the columns (banners) and rows (stubs)
depending on the type of matrix
9
10. ABOUT DATA (CONT.)
• Raw master files of data should be kept
• Data are generally parsed in the following ways: classification, frequency (or
intensity), relationship [null, associational (negative or positive), causal, time
relation (slice-in-time, over time, discrete time or continuous time, predictive
or future-focused), and space relation], and others
10
11. ABOUT DATA (CONT.)
• Only as good as the sourcing and methods for collection
• Should be legally sourced and collected
• Should be accurately maintained and handled (with whatever levels of
confidentiality required)
• Is only “good” for a particular time (but also valuable in time for historical
purposes and base-lining and possible trend-line analysis)
• Is the base material on which research assertions and analyses are made
• May be grounds for fresh research hypothesizing
• May be a renewable resource in some circumstances and one-offs in others
11
12. ABOUT PUBLICLY SHARED DATASETS
• Full dataset may be shared online at the time of publication per grant funder
requirements and some practices in some domains
• Shared dataset needs to be properly documented in terms of sources and
methods and in terms of crediting
• This is often in a README file accompanying the data or included at the top-level of
the dataset as a text box
• Data limitations and qualifiers need to be acknowledged
12
13. ABOUT PUBLICLY SHARED DATASETS
(CONT.)
• Data need to be cleaned:
• Repeated information should be omitted
• Outliers should be deleted (or mitigated)
• Data norming should be applied so that the meanings of disparate terms may be
captured, and others
• Data may need to be re-structured for different types of data analytics in
different software programs
13
14. ABOUT PUBLICLY SHARED DATASETS
(CONT.)
• Shared dataset data need to be properly labeled; the data need to be
structured in conventional ways for ease-of-use and professionalism
• Columns as variables, rows as individual data entries
• Data need to be versioned in multiple formats for download and sharing
• Data need to be de-identified and made robust against re-identification
(avoidance of data leakage)
14
15. ABOUT INFORMATION
• Is an extraction from raw data, and is more processed (filtered, cleaned,
selective) than raw data
• Contains some interpretation and framing
• Contains applied value for human use and benefit for awareness, decision-
making, and other applications
• Should be accurate and avoid any sort of mis-representation, even by nuance
or false inference
15
16. ABOUT DATA VISUALIZATIONS
• Are a purposive and selective data summarization (of the underlying data), and
they generally contain particular dimensions or facets of the underlying data
• May be linked to the underlying data (for reproducibility)
• Involve titles, shape labels, callouts, data labels, keys / legends, and colors /
shading (to disambiguate the information)
• May involve moved data to avoid occlusion
• May include picture-in-picture layout
16
17. ABOUT DATA VISUALIZATIONS (CONT.)
• Include visual aesthetic style elements
• Lines, shapes
• Color palettes
• Backgrounds
• Fonts
• Are usually stand-alone but also may be used in an original context (so may
have dependencies)
• Follow particular data visualization conventions and common practices
17
18. ABOUT DATA VISUALIZATIONS (CONT.)
• May be 2D (x- and y-axes), 3D (x, y, and z axes), and 4D (x and y axes and time;
x, y, and z axes and time as the 4th dimension)
• Should follow all laws
• Should respect intellectual property and not contravene IP rights
• Should also give credit where it is due
• Should respect privacy rights and not contravene privacy rights
• Should have legal and signed media releases for all depictions of people’s likenesses
• Should be accessible, with the information available in multiple modalities
18
19. ABOUT DATA VISUALIZATIONS (CONT.)
• May be drawn from different sources:
• raw data: structured, unstructured, semi-
structured
• synthetic (faux) data
• processed information
• theory(ies)
• model(s)
• projection(s)
• concepts
• May be drawn from a combination of
sources
• The underlying sources and the visuals
inform understandings of the data
visualization and the confidence that may
be applied
19
20. ABOUT DATA VISUALIZATIONS (CONT.)
• May be created in a number of ways:
• manually drawn with diagramming tools, note-taking tools, tablet drawing programs
and styluses
• may be pre-planned or drawn on-the-fly (spontaneously) in a freeform way;
• drawn by machine based on both data and various computer algorithms
• statistical analyses (correlations, chi-square test, simple regression, multiple
regression, t-tests,ANOVAs, sign tests, and others)
• cluster (similarity / dissimilarity) analysis
• machine learning or computational identification of patterns in data
20
21. ABOUT DATA VISUALIZATIONS (CONT.)
• May be created in a number of ways: (cont.)
• drawn by computer program (cont.)
• agent-based modeling
• data modeling
• simulation
• virtual immersive worlds, and others
• and often created with a mixed sequence, such as some computational data
visualization augmented by manual data labels and other visual overlays
21
23. WHAT DO THE FOLLOWING
DATA VISUALIZATIONS SHOW?
• The following data visualizations are based on education-seeded datasets and
various software programs.
• The data sources include the following: curated text sets, LMS data portal
data, social media datasets, crowd-sourced encyclopedias, non-consumptive
text analysis data, and others.
• The data visualizations are labeled by the following: (1) data, (2) data
visualization type, and (3) software technology.
23
24. GENERAL STEPS TO RESEARCH AND THE
ROLES OF DATA VISUALIZATION
24
58. Narrowcasting vs. Broadcasting to
Conceptual Audiences
(social image set coding—by hand;
data coded by audience type and
frequency)
Area Chart
Excel 2016
58
76. An IT Satisfaction Survey
Cross-Tabulation Analysis
(with chi-squared scores, p-values)
Qualtrics
76
77. THIRD PARTY TOOLS ACTIVATED IN K-STATE
INSTANCE OF CANVAS LMS IN DESCENDING
ORDER
77
78. Third-PartyTools Activated in Canvas LMS at K-State
Statistic Chart / Pareto Chart (sorted histogram)
[items in descending order along x-axis; raw number counts (left)
and percentages (right) on the 2 y-axes]
(orange curve a cumulative aggregation of items comprising the set)
MS Excel 2016
78
87. VARIANT BILL-OUTS FOR SET OF
INSTRUCTIONAL DESIGN PROJECTS ON A
UNIVERSITY CAMPUS
87
88. Billing Data for Instructional Design Projects
Scattergraph with Lines
MS Excel 2016
88
89. Billing Data for Instructional Design Projects
Treemap
MS Excel 2016
89
90. TIME-TO-EVENT ANALYSIS (FORMERLY “SURVIVAL
ANALYSIS”) OF INSTRUCTIONAL DESIGN PROJECTS AND
TIME WHEN A PROJECT ACHIEVES EVENT (IS PAID OUT) OR
IS CENSORED (DOES NOT ACHIEVE EVENT DURING THE
RESEARCH PERIOD)
90
91. Instructional Design Billing Data
Kaplan-Meier Curve / Line Graph
(based on “survival analysis”)
IBM’s SPSS Statistics
91
92. Instructional Design Billing Data
Line Chart
(based on “survival analysis,”
non-descending stepwise curve)
IBM’s SPSS Statistics
92
100. 100
Some CitiesVisited over theYears
(by city, state, and country…
and numbers of visits)
3D Map (interactive, zoomable)
Microsoft Excel 2016 / Bing Maps
106. DATA
• What / Entity
• Frequency / Intensity (How Much?)
• Relationships (Association, Causation,
Hierarchical, and Others)
• Slice-in-Time
• Changes overTime
• Shape
• Size,Thickness, Height
• Connected Lines, Scatter in Space,Tree
Structure Diagrams, and Others
• Time Label,Time Indicator
• Line / Scatter over the X-axis
VISUALIZATION
DATA -> VISUALIZATION
106
108. COMMON FORMS
• Timelines
• Bar Charts, Pie Charts, Line Charts, and
Others
• Models (Venn Diagrams, Figures)
• Geographical Maps
• Photos / Imagery
• Simulations
• 3D ImmersiveVirtual Worlds
• 4D ImmersiveVirtual Worlds
• Games
• Video
108
109. ONLINE LEARNING CONTEXTS
• Online learning includes both term-length courses and short courses (such as for trainings).
• In online learning, instructors maintain a level of telepresence through interactions and
intercommunications with learners.
• Learners maintain some level of inter-communications with their peers. They co-create
learning communities to support each other’s learning.
• In an online learning context, learners have to be somewhat self-driven and self-directed.
• Given that online learning occurs via theWeb and Internet, learners have easy access to online
resources: digital libraries, websites,immersive virtual spaces, online datasets, and other
contents.
• Depending on the sociability of others, they will have access to experts and peers to engage
with about various topics.
109
110. ONLINE LEARNING CONTEXTS (CONT.)
• The nature of the online learning context means that online learners will have
access to other datasets and data visualizations related to the same
information…and other perspectives and points-of-view.
• Ostensibly, they’ll be able to see if data visualizations are borrowed and reproduced
from elsewhere (through reverse image search, through basic Web image search).
• They’ll be able to access public datasets.
• They’ll be able to see if there are different datasets, data visualizations, and different
understandings and interpretations of the issue.
110
111. REQUIRED LEARNER RESPONSES
• The data visualization(s) need to be designed so that learners do the following:
• Pause, not just blitz past
• Engage with the visualization (and interact for the interactive visualizations)
• Extract accurate meaning for the learning
• Reflect
• Follow-through on learning activities
• Experience inspiration
111
113. EFFECTIVE DATA VISUALIZATIONS…
• represent the selected underlying data accurately based on the inherent form
and structures in the underlying data and on user needs (and control against
misperceptions and misunderstandings);
• highlight relevant aspects of the underlying data;
• convey information in an aesthetically pleasing way (to attract human attention
and to increase the memorability of the visualization and the underlying
information);
• align with conventions of the respective data visualizations (directionality of
reading, respective sizes of elements, placement of elements in relation to each
other, naming and labeling protocols, perspective, and other aspects);
113
114. EFFECTIVE DATA VISUALIZATIONS…
(CONT.)
• maintain consistency both within and across related data visualizations;
• are accessible in terms of element labeling, text readability, image resolution,
and uses of color [proper contrast, proper color palettes, applied fill, and
way(s) to convey information beyond color];
• are presented in a contextualized way, including access to information about
the underlying research, data collection, and data cleaning;
• avoid unnecessary (read: purely decorative, non-information-bearing)
elements, and
• occasionally connect to the underlying data (data portals, interactive web-
based data visualizations), among others.
114
116. SOME MECHANICS OF
VISUAL PERCEPTION
• The human visual perception system includes the eyes (cornea, lens, and retina), the
optic nerves, and visual paths in the brain to process light information.
• The retina contains 150 million light-sensitive rod and cone cells
• In the brain, there are hundreds of millions of neurons that process visual information (“and
take up about 30 percent of the cortex, as compared with 8 percent for touch and just 3
percent for hearing”)
• Optic nerves consist of “a million fibers” each (Grady, June 1, 1993,“TheVisionThing:
Mainly in the Brain,” Discover)
• Based on the eye’s structure, its focal vision is powerful, but peripheral vision is
very limited.
116
117. PRE-ATTENTIVE PROCESSING
• The human visual perceptual system captures initial visual information in a pre-
attentive and subconscious way initially (“Pre-attentive processing,” Nov. 29,
2016).
• Based on interest and training, a person may then focus attentively on the visual
stimulus.
• Subconsciously and unconsciously acquired details of the world can affect the person
and his / her decision-making whether he / she is consciously aware of the details or
not.
117
118. VISUAL SIGNALS BEYOND THE PHYSICAL
• Perceptual signals do not only come from the world but also from the mind
and body (internally).
• Vision, though, is informed by the prior experiences (prior observed patterns)
of the individual.
• One researcher suggests that in visual perception: 40% comes from visual signals, and
60% comes from prior experiences and memory (Catmull, 2014, Creativity, Inc.:
Overcoming the Unseen Forces that Stand in theWay ofTrue Inspiration, p. 178).
118
119. EIDETIC MEMORY?
• “Eidetic” memory refers to the ability to recall mental images with high detail.
• Some people, particularly a subgroup of children, are able to view memories
like photos for some minutes.
• Photographic memory, though, has not been established empirically and is not
currently thought to exist.
119
120. A VISUALLY DETAILED AND
INFORMATIVE WORLD
• Human visual imagistic representations of the world (in the mind) are not that
inherently informative.
• Human visual memory seems so powerful because the world itself serves as an
“outside memory” (O’Regan, Sept. 1992).
• A common eloquent expression of this idea is that “the world is its own memory.”
120
121. VISUAL THINKING
• “Visual thinking” refers to human intelligence and imagination which enables
people to conceptualize in imagery, not just language.
• “Visual literacy” refers to the ability to discover meaning from imagery.
• There is research that people interpret artworks in a predictable manner, even
across “a wide range of cultural and socioeconomic contexts” (Housen, 1992a,
2000, 2002; Housen, DeSantis, & Duke, 1997, as cited in Housen, 2007, p. 2),
which may suggest a hard-wired biological basis.
• The stages are as follows:
• (1) “accountive” with “simple, concrete observations”;
121
122. VISUAL THINKING (CONT.)
• The stages are as follows (cont.):
• (2) “constructive” based on perceptions,“knowledge of the natural world,” “the values
of their social and moral world,” with observations based on known reference points;
• (3) “classifying” with viewers acting as “art historian” by placing the artwork in a
context of conventions and art history canons;
• (4) “interpretive” based on “interactive and spontaneous” encounters with the
artwork, and
• (5) “re-creative” by reflecting about art and suspending belief in order to see the
work as “semblant, real, and animated with a life of its own” (Housen, 2007, pp. 3 – 8)
122
123. PRIOR EXPERIENCES
WITH DATA VISUALIZATIONS
• If people’s visual systems are trained by the human built environment and their
exposure to familiar forms, so, too, are people’s systems trained by prior exposures
to data visualizations.
• Some common expectations for data visualizations:
• Start at the top and read down. Start at the left and read right.
• Size means visual salience and importance.
• Color and boldness means visual salience and importance. Bright colors are warning colors.
• Movements (changing numbers, scrolling data, and others) are attention-getting.
• Eye movements often track with shapes and lines.
123
124. SOME IMPLICATIONS
FOR DATA VISUALIZATIONS
• Data visualization conventions should be followed.
• Human tendencies to read stories and meanings into every element of a data
visualization should be understood and supported. This means that no excess or
misleading information should be included.
• The human eyes’ capabilities to detect nuance should be catered to. It may be
helpful to add gridlines and other details to enhance understanding of a graph.
• Whatever visual elements in a data visualization should work together
harmoniously, and they should not clash or engage competitively for human
attention.
• All measures should be consistently applied across the data visualization.
124
126. MAIN THEORISTS AND THEORIES
• Richard Mayer’s CognitiveTheory of Multimedia Learning (2002):
Engaging cognitively involves costs to the learner.
• (1) Intrinsic cognitive load is related to the difficulty of the topic-to-be-learned.
• (2) Extraneous cognitive load is based on how information is designed and
presented.
• (3) Germane cognitive load is dependent on “the processing, construction and
automation of schemas” (schemas being frameworks for understanding parts of the
world). There are ways to design multimedia to align with human cognitive limits to
lighten cognitive loads to enhance learning.
126
127. MAIN THEORISTS AND THEORIES (CONT.)
• John Sweller’s Cognitive LoadTheory (1988): “Means-ends analysis”
requires high cognitive load on people, and those who teach can lighten the
load for learners by offering organizing schemas and “worked-examples” and
“goal-free problems.”
• Allan Paivio’s Dual-CodingTheory (1960s / 1971): Humans process
information through separate auditory and visual channels. Verbal (word,
symbolic) and non-verbal (visual image) information is processed in different
channels.
127
128. IMPLICATIONS ON DATA
VISUALIZATION DESIGN
• CognitiveTheory of Multimedia Learning
• Complex topics should be unpacked in a clear way to limit intrinsic cognitive load.
“Extraneous cognitive load” should be avoided through effective design.
• Data visualizations should never be decorative alone (because these may be
distractive).
• Data visualizations should have main relevant aspects highlighted and noted, to lower
germane cognitive load. Learners should not be given confounding data visualizations
without clear meanings.
• Cognitive scaffolding should be designed into the data visualizations about topics with
high intrinsic cognitive load.
128
129. IMPLICATIONS ON DATA
VISUALIZATION DESIGN (CONT.)
• Cognitive LoadTheory
• Data visualizations should be placed in the context of a relevant framework in a
particular learning domain or context. The data should be presented in the context
of accepted schemas.
• Ambiguity requires cognitive load to process, so if learners need to apprehend a data
visualization right away, it should be presented as a “worked case.” Problems, when
presented, should be “goal free” and pre-solved in many cases.
129
130. IMPLICATIONS ON DATA
VISUALIZATION DESIGN (CONT.)
• Dual-CodingTheory
• Data visualizations presented to learners should not only be on purely verbal or non-
verbal channels.
• There should be a balance in the modality of information, so learners can process the
information appropriately.
• There are contested ideas about how much redundancy across channels should be
deployed to convey information. Too little coding may leave the learner with
insufficient information; excessive redundancy may cause expensive cognitive overload
with unnecessary excess.
130
132. REVIEW: 10 STEPS TO CREATING DATA
VISUALIZATIONS
1. Analyze the data
2. Clean / process the data
3. Select the data aspect(s) to highlight
4. Structure the data for the visualization
5. Create initial data visualizations
6. Analyze the data further
7. Add data labels, title, key / legend, and other elements
8. Pilot-test the data visualizations (stand-alone)
9. Pilot-test the data visualizations (in context)
10. Finalize the data visualizations
(as seen on Slide 25)
132
133. DEBRIEFING THE SEQUENCE
• A data visualization begins with intimate
knowledge of the underlying data.
• Data often has to be processed in the
correct format for visualization.
• Data visualizations are used partially as a
data exploration method.
• Data are often processed in multiple
different methods…and even in multiple
different software programs in order to
see what may be learned from the data.
• Depending on aesthetics, some may
process data in one tool and export the
resulting data tables and / or other digital
artifacts for final processing in other
software programs.
• There are data visualization drafts
created before a final one is output (for
presentation).
• Data visualizations have to be human-
readable and human-usable.
133
134. MORE TO THE STORY…
• To create relevant data visualizations, those who would design data visualizations
need to understand the following:
• the underlying data and prior research
• the statistical assumptions
• the conventions of the particular data visualizations
• the target audiences (and the incidental audiences)
• the socio-cultural and geographical backgrounds of the target and incidental audiences (in
order to avoid miscommunications and potential offense)
• the requirements (color processing, resolution, and others) and technical versions of the
imagery needed for digital distribution and print
134
135. RESEARCH STANDARDS…
• Following legal standards for research
and data collection, including professional
oversight, informed consent, candor,
benevolence, and others
• Following legal standards for data
handling and storage
• Following legal standards for privacy
protections of research participants (and
data)
• Following legal standards for information
accuracy (and controlling for negative
understandings)
• Following professional guidelines for
integration of mixed data from various
datasets
• Minimizing interpretive skew
QUALITY STANDARDS FOR DATA
VISUALIZATIONS
135
136. HISTORICAL ACCURACY
• Ensuring that the research work was as
solid as possible given the
contemporaneous limits of time, talent,
treasure, technologies, and methods
• Ensuring that the data may be used in the
future, based on future-created
capabilities
• Using data and data visualizations in
ethical and professional ways
• Providing benefit in the deployment of
data and data visualizations
• Avoiding harm in the deployment of data
and data visualizations
• Providing full disclosure in the provision
of information
PROFESSIONAL USE
136
QUALITY STANDARDS FOR DATA
VISUALIZATIONS (CONT.)
137. INTELLECTUAL PROPERTY (IP)
• Creating original contents using materials and
data that one has legal rights to use
• Using software that is legally acquired
• Giving credit where it is due (such as in cases of
open-source and / or Creative Commons-
released materials)
• Avoiding contravening others’ intellectual
property
• Doing due diligence to identify ownership of
works (even for “orphaned” works)
• Acquiring informed consent from all participants
in research (and maintaining accurate and up-to-
date documentation of these permissions)
• Acquiring media releases for uses of people’s
likenesses (such as for audio, video, and other
recordings and captures)
• Protecting data (both in transit and at rest)
• De-identifying data where necessary (to the
standard that re-identification is not possible)
PRIVACY PROTECTIONS
QUALITY STANDARDS FOR DATA
VISUALIZATIONS (CONT.)
137
138. ACCESSIBILITY…
• Ensuring that all data visualizations are
available to users in multi-modal channels
(visual, textual / audio)
• Channels should offer equal informational
value
• Ensuring that 4D data visualizations (with
the time element) may be controlled by
users (some the timing may be slowed or
stopped, for easier usage)
• Ensuring that data tables may be read
coherently by screen readers
• Ensuring that color is not used as the
only channel for information conveyance
(for those with color-blindness)
• Using high-contrast colors to enable
accurate visual uptake of information, and
others
QUALITY STANDARDS FOR DATA
VISUALIZATIONS (CONT.)
138
139. REPRODUCIBILITY
• Enabling access to the underlying data
behind data visualizations
• Enabling contemporary and future
researchers to explore the data for
accuracy and applicability to other
contexts (and through other interpretive
lenses)
• Enabling multiuse data
• Enabling other researchers to go through
the same steps as the original researcher
to come out with the same results from
the dataset(s)
REPEATABILITY
QUALITY STANDARDS FOR DATA
VISUALIZATIONS (CONT.)
139
140. ADDITIONAL COMMON ERRORS IN
DATA VISUALIZATIONS
The Data
• Introducing error in data handling, data processing, and / or data cleaning
• People who work too quickly will accidentally delete information or corrupt it if they
are non-thinking in their work
• Using an unaligned data visualization type for the underlying data
• It’s easy to get a software program to output a data visualization without actually
understanding what is going on with the data or in the software
140
141. ADDITIONAL COMMON ERRORS IN
DATA VISUALIZATIONS (CONT.)
The Data (cont.)
• Using high-density data that may overpower the data visualization
• Excessive nodes in a network will make the network unreadable
• Insufficient understanding of the limits of assertions that may be made with
that visualization
• Not remembering that data visualizations are summary data, not comprehensive (in
most cases)
• Not remembering that data visualizations are inherently ambiguous and polysemous
(multi-meaninged) and can be interpreted in different ways by different beholders
141
142. ADDITIONAL COMMON ERRORS IN
DATA VISUALIZATIONS (CONT.)
The Data (cont.)
• Labeling data visualization elements incorrectly
• Insufficient labeling of visual elements (such as data labels) in the data visualization
• Incorrect labeling of data elements (confusing rates over time with set amounts)
• Using mixed measures in data
• Not using consistent time measures
142
143. ADDITIONAL COMMON ERRORS IN
DATA VISUALIZATIONS (CONT.)
The Data (cont.)
• Not considering language
• Simple English reads better and translates better
• Parallel construction should be applied to all language use in a data visualization
especially since language is so sparse and powerful in a data visualization
• Spelling should be correct
• The language used should be consistent and aligned with the research context
143
144. ADDITIONAL COMMON ERRORS IN
DATA VISUALIZATIONS (CONT.)
The Software
• Using software without understanding the software
• Researchers will use software programs without reading the manuals and the underlying
documentation (or they’ll go to forums before they go to the actual documentation)
• Of course, some software makers do not document as well as they should (most will not
reveal underlying algorithms, for example)
• Researchers need to understand the software programs they’re using, particularly for
coding and analysis
• They need to represent what they learned while using the software, not just mention that
they used the software (as if that would lend their work credibility)
• Shabby work reads as shabby, and name-dropping a software tool will not make things
better
144
145. ADDITIONAL COMMON ERRORS IN
DATA VISUALIZATIONS (CONT.)
DataVisualization Conventions
• Not understanding the conventions of a data visualization
• Spatial relationships in 2D, 3D, and 4D planes
• Oftentimes, people will combine 3D and 2D, breaking the illusion of the z-axis
• Shapes and meanings
• Color applications
• Lines: thickness,color, interruptions, line ends, and others
• Symbology and iconography
• Textures and patterns
145
146. ADDITIONAL COMMON ERRORS IN
DATA VISUALIZATIONS (CONT.)
Contextual Details
• Not offering sufficient contextual details to fully understand a data visualization
• Not including data parameters for data processing in a data visualization
• Not indicating that a data visualization is conceptual vs. empirical
• Not labeling synthetic or faux data as such
• Misrepresentations of information
146
147. ADDITIONAL COMMON ERRORS IN
DATA VISUALIZATIONS (CONT.)
Going Glam / Not Going Glam
• Using data visualizations that are glamorous (read: 3D) but which
misrepresent data
• Misplacement of data on the x, y, or z axes
• Occlusion of visual data
• Not considering aesthetics
• Using mixed color palettes (or using colors without any consistency or strategy)
• Using poor aspect ratio (stretching data visualizations)
• Not designing for white space (by overloading a data visualization)
147
148. ADDITIONAL COMMON ERRORS IN
DATA VISUALIZATIONS (CONT.)
Audience Needs
• Not fully considering audience needs [such as by running pilot tests; such as
building for both expert and general audiences (simultaneously)]
• Visual perception needs; cognitive and symbolic processing needs (symbols, language);
accessibility needs
• Learner developmental stage needs (with implications for the data visualization and
the sequence of related data visualizations)
• Informational needs
• Technological needs (such as viewing data visualizations on mobile devices and
smartphones with small screens)
148
149. ADDITIONAL COMMON ERRORS IN
DATA VISUALIZATIONS (CONT.)
Designing for Usage Contexts
• Incomplete consideration of various contexts in which the data visualizations
may be used
• Insufficient consideration for both stand-alone (the disaggregation of elements in
online learning) and in-context usages of the data visualization
• And others
149
150. TEMPLATING
• For those working on projects, it helps to…
• define comprehensive data visualization
standards early in a project stylebook
• use prototypes of data visualizations and
images and test these with people who are
similar to those who will ultimately consume
the data visualizations
• create evolving data visualization templates for
use during the lifespan of the project
• It is important to keep clear
documentation of all work and how the
data visualizations were created
• It is important to keep all raw files,
especially data ones, for re-do’s as
needed
CLEAR DOCUMENTATION
AND STORAGE OF RAW FILES
QUALITY APPROACHES
150
152. SOME COMMON DATA VISUALIZATION
CONVENTIONS
• There are senses that there are optimal amounts of data for a particular data
visualization. Excessive data makes a data visualization hard-to-read or confusing;
too sparse data makes a data visualization feel incomplete.
• Data visualizations are generally read from top-to-bottom and left-to-right.
• Timelines are read either from top-to-bottom or left-to-right, for example.
• In hierarchical data visualizations, there are typical ways to interact with them, from general
to specific or specific to general.
• Sunburst diagrams are generally read from general to specific.
• Dendrograms are generally read from leaf to branch to trunk to root (specific to
general).
152
153. SOME COMMON DATA VISUALIZATION
CONVENTIONS (CONT.)
• If there is a sequence of data visualizations, these are usually presented from
simple to complex.
• The opposite sequence can also be applied.
• In linear regressions, the x-axis is usually time, and the y-axis is the variable.
Or, both axes can be variables.
• Data may be engaged with with varying levels of granularity. Less specific data
may not be labeled, but at finer levels of granularity, data labels are often used.
• Data tables may be published out with the data visualization because of the
imprecision of human vision in assessing finer distinctions in visualizations.
153
154. SOME COMMON DATA VISUALIZATION
CONVENTIONS (CONT.)
• Time is often an important part of a data visualization, whether time is treated
as discrete (slice-in-time), periodic (in phases), or as continuous. Time is an
important variable in all research.
• Data visualizations are usually named (titled) for easier reference.
• Titles are generally factual and descriptive.
• They are typically in the form of noun phrases.
• Some titles point to the main gist of the data visualization.
• Titles are usually written in title case, with all main words capitalized and prepositions
and articles in lower case.
154
155. SOME COMMON DATA VISUALIZATION
CONVENTIONS (CONT.)
• Some data visualizations are offered along with underlying datasets that inform the
data visualization—for reproducibility of the data visualization (and for enriched
research using the shared data).
• In some cases, datasets are offered along with the R or other high-level computer language
script used for the data visualizations, so users may experience the data visualizations in
interactive ways.
• If external data are used, the data source should be cited. (Many publicly shared
datasets come with desired citations. Some of these may have to be tweaked to
follow the proper citation method of the target publisher.)
• If external data are processed or intermingled with other data, that should be done
with finesse (so as not to corrupt the data). How that was done should be clearly
documented.
155
156. SOME COMMON DATA VISUALIZATION
CONVENTIONS (CONT.)
• Aesthetically, data visualizations are created with a proper balance of filled-in spaces
and white spaces.
• Borders and edges are not usually included but site designers and publication
production personnel decide applications during the design process.
• Color palettes are deployed for both aesthetics and for accessibility.
• Color palettes may be polychromatic (multi-colored) or mono-chromatic (one color but
different shades).
• Sufficient color selection (avoiding colors that are imperceptible for those with color
blindness challenges) and proper contrast are important for visual accessibility.
• Color should never be the only conduit for information; labels should be used strategically
to convey meaning.
• Color (arrayed along the light spectrum) affects viewer perceptions,including their moods.
• Cultural backgrounds may also affect the inherent meaning in colors.
156
160. SOME COMMON DATA VISUALIZATION
CONVENTIONS (CONT.)
• Most data visualizations use only one or two font styles.
• Font sizes tend to be within a certain size range, so that there are not huge
differences in sizes, particularly for shared and similar types of data.
• Texts in data visualizations are hierarchical and structured (even though they’re not
generally tagged within the data visualization currently).
• The location (position) of the text may be indicative of its importance.
• The larger the font, the more important the data.
• The font sizes of titles may be quite a bit larger than other font sizes used in a
data visualization because of its central role in the visualization.
160
161. SOME COMMON DATA
VISUALIZATION
CONVENTIONS (CONT.)
Font types in data visualizations tend to be sans
serif for easier readability.
Whenever possible, it’s a good idea to have
labeling text right-side-up for readability.
In some rare cases, it is allowable to have text
in various directions, but this is not generally
desirable.
(Note: Most software programs that enable data
visualizations will have some helpful presets. Data
designers break the presets at their own risk.)
161
162. SOME COMMON DATA VISUALIZATION
CONVENTIONS (CONT.)
• More complex data may be represented in interactive data visualizations.
• For example, data visualizations in software programs, data dashboards, and some websites may
offer the ability to access the underlying data.
• Some simulations enable the input of different data parameters in order to see how these differing
inputs can affect outputs.
• When interactive data visualizations are exported as static images, there is always capability
loss and related data loss. (To preserve the information, it would help to export multiple static
images along with discussion points.)
• Data visualizations should be optimized to the various modes of usage—in print, in screen, and
so on. To these ends, these should be versioned for proper resolution, color type, file types,
and so on. For many print publications, only b/w or grayscale are applied to data visualizations
(because of the prohibitive costs of use various ink colors).
162
163. SOME COMMON DATA VISUALIZATION
CONVENTIONS (CONT.)
• Data visualizations may be used in whole or in part.
• Data visualizations may be designed for audience needs in two ways:
• To fulfill the needs of a narrow audience focused on specific information from the
data
• To fulfill the needs of a broad audience based on a wide range of user needs from the
data
163
164. SOME COMMON DATA VISUALIZATION
CONVENTIONS (CONT.)
• In publication, data visualizations are not usually bylined within-image.
• Bylines may be given in the publication, unless the author of the work created
the data visualizations.
• All datasets and data visualizations have limitations, and it is better to have
such limitations addressed as part of the publication process.
• This is addressed in the “delimitations” section. Data qualifiers should be included
with the data visualization. The level of confidence linked to the underlying data and
data visualizations should be addressed.
164
166. BASICS ABOUT 2D DATA
VISUALIZATIONS
• 2D data visualizations exist on a flat two-dimensional plane.
• The planes are usually squares or rectangles (quadrilaterals). Within the area,
various types of data visualizations may be displayed.
• Generally 2D data visualizations are understood to have an x-axis and a y-axis
(such as linear regression graphs, bar charts, and others). In some cases, the x-
and y- axes do not apply since the visualization may be rotated and maintain
the same meaning (such as some forms of network graphs, bubble diagrams,
and others).
166
168. BASICS ABOUT 3D DATA
VISUALIZATIONS
• Three-dimensional (3D) data visualizations are drawn on a space that involves
more than area (also volume) and three dimensions: x-axes, y-axes, and z-axes.
• In many software tools, the 3D effect is created with shading and the
appearance (illusion) of a third dimension.
• Such visualizations tend to be rotate-able and zoomable for clarity.
• People are not thought to process 3D data very well because of challenges
with occlusion and visual ambiguity.
• Often, 3D visualizations may also be offered in 2D.
168
170. BASICS ABOUT 4D DATA
VISUALIZATIONS
• The fourth dimension is conceptualized as time. For data visualizations, this
means changes over time.
• Changes over time may be seen in spaces that are two-dimensional or three-
dimensional.
• Time may be discrete (a particular slice-in-time), phased (into periods), or
continuous.
• Time may be presented in sequential order or reverse-sequential order, in terms of
phased or continuous time.
• In data visualizations, time may be run forwards or backwards in simulations,
virtual immersive worlds, and video.
170
172. ORDER AT THE MICRO-LEVEL
• In processing data for a data visualization, there are common micro-level
organizational aspects. They may include the following:
• alphabetization (letter order)
• numerical order (positioning in a list, rank, others)
• simple to complex, complex to simple (complexity)
• smallest to largest, largest to smallest (size)
• chronological date order, reverse chronological data order (date)
• top to bottom, bottom to top, left to right, right to left, outside in, inside out (spatial)
• most to least, least to most (amount)
• categorization (type)
• These ideas apply to the organization of data visualizations as well, in terms of
providing guidance on how such visualizations may be sequenced. 172
173. SOME ORDER PREFERABLE
• Data visualizations may be presented in a particular order or sequence.
• The order may be somewhat serendipitous only in terms of placement in a slideshow,
in a book, on a web page, and so on.
• The order may be purposeful to highlight some macro- or micro-level observation
about the data.
• No matter how the presentation order comes about, it helps to have an
underlying rationale or logical trajectory for the sequence.
• Even if viewers do not notice the organizational logic in the sequence, the learning is
made easier by having some order.
• This small section provides some ideas for the data visualization sequencing.
173
174. ORDER: SIMPLE-TO-COMPLEX
• Data visualizations may be presented in a simple-to-complex way, to bring
observers along with the flow of the data revelations.
• Simple pieces may be offered first to build up to a complex summary data
visualization, for example.
• Or, the data visualizations may begin with a complex visualization and then
offer more simple zoomed-in views of to offer more in-depth discussion and
insights.
174
175. ORDER: FEATURE-BASED,
GENERAL-TO-SPECIFIC
• Most datasets today are multi-dimensional and complex. One way to sequence
data visualizations is to focus on different aspects or features of the dataset.
• It may be helpful to create an over-arching structure of the dataset’s features
and use those to organize the data visualizations.
• This is the general-to-specific, top-down, and deductive approach.
• For example, if datasets involve a learning management system, would it be
helpful to organize the data visualizations by the data dictionary? The various
features of the LMS from most commonly used to the least commonly used?
The features by role (student, faculty, advisor, instructional designer, librarian,
and administrator)?
175
176. ORDER: FEATURE-BASED,
SPECIFIC-TO-GENERAL
• Another way is to start with the minutiae and details and broaden out.
• This is the specific-to-general, bottom-up, and inductive approach.
• This approach can be used to build interest and suspense…as to where the
details are leading.
• For example, in a study of social image sets, it is possible to code the imagery
to different categories first (in an emergent way, without a priori assumptions),
and then identify data patterns in the imagery…and then hypothesize from the
empirical data. The data visualizations can move from the details to the over-
arching macro structures in that sequential order.
176
177. ORDER: TIME-BASED
• Data visualizations may show a phenomenon changing over time.
• In this case, time is usually chronological.
• The changes may be a factor of time, a factor of an intervention or multiple
interventions, a factor of a process, or other factors.
• Time itself may be discrete, phased, or continuous.
• The time may be in sequential order, reverse-sequential order, or some mix of
phasing.
177
178. ORDER: SPATIAL,
ZOOMING IN- AND OUT-
• Data visualizations often contain complexity.
• Another organizational sequence may involve the following based on spatial
and scale views:
• Zooming-in to a data visualization for deeper micro understandings
• Zooming-out from a data visualization for deeper macro understandings
178
179. ORDER: AMOUNT OR INTENSITY
• The “most-to-least” (descending order) and “least-to-most” (ascending order)
approach enables a sense of substance.
• For example, the most frequent (mode) word in a text set may be introduced in a
data visualization, whether that word was identified in a word frequency count, a
topic model, or something other method.
• Then, data visualizations showing other highlighted terms in descending order may be
introduced.
• Then maybe insights from the long tail in the text corpora may be introduced.
• This whole theoretical sequence is in descending order, from most to least.
179
180. ORDER: THEORETICAL AND ACTUAL
• Another sequence may begin with a concept or model or some theoretical
conceptualization followed by empirical and actual data.
• This is the top-down approach, beginning with the general and moving to the specifics.
• Or, the sequence can go the other way, with observations from-world…and
moving to a more general data visualization.
• This is more of a bottom-up approach, beginning with specifics and moving to the
general.
180
182. A DATA VISUALIZATION “SURROUND”
• Data visualizations may be presented not only as stand-alone visualizations but
within a context or surround.
• A most close-in aspect of context may involve the data visualization directly.
• An important aspect of context involves the backstory behind the data
visualization.
• Where did the data come from? What sort of research was conducted in order to
capture the data? How was the data cleaned and processed?
• If datasets were mixed, where did the data come from? How were the datasets
mixed? Who should be credited?
• What are some qualifiers that need to be applied to the data visualizations?
182
183. SOME BENEFITS OF PROVIDING
“CONTEXT”
• If designed properly, a context for a data visualization achieves the following:
• enriches the data
• provides direction for proper interpretation of the data (highlights what “story” the data
are telling)
• suggests the relevance of the data in the real world
• raises interest about the data visualization(s)
• offers access to the underlying dataset
• provides ideas about where to acquire more relevant information about the related data
• gives credit where it is due for the data visualization, the dataset, the research, and other
related information, and others
183
184. ELEMENTS OF “CONTEXT”
• At a superficial level, data visualization “context” involves the lead-up and lead-away
text surrounding the data visualization.
• This may include stories to “set up” the phenomenon under study.
• This may include table data and downloadable datasets.
• This may include captioning, credits, research citations, and other details.
• This may include qualifiers.
• This may include lead-up multimedia (audio, video, and others) to prime learners to
understand the data visualization.
• There may be a lead-up or lead-away interview by the researchers or data analysts
or others related to the work.
184
185. “CONTEXT” BY ASSIGNMENT
• The learning situation offers some direction for the design of data visualization
context. Especially in a learning context, the instrumental uses of the data
visualization are important.
• The assignment should specify how learners should read / use the data
visualization or the data visualization sequence.
• For cognitive scaffolding, it may help to let learners know what to pay attention to in
the respective data visualizations. In a simple case, learners may only need to view the
data visualization and interpret what its meaning is.
• Some assignments can be broadly open-ended, with the data visualization(s) as
a jumping-off point for discussions, analyses, research, and other work.
185
186. DATA VISUALIZATIONS IN
ONLINE LEARNING CONTEXTS
• A slideshow
• A video
• A simulation
• A discussion board conversation
• A case for analysis
• A role play
• A group project
• A writing assignment
• A research assignment
• A field trip, and others
186
187. “CONTEXT” BY ISSUE
• Another method to build a surround around a data visualization or series of
data visualizations is by contextualizing these as part of an issue.
• An issue may be an in-world phenomenon, with its own history, evolution,
present, and future. There may be particular dynamics with this phenomenon
and certain levers and mechanisms that may affect the changes to this
phenomenon.
• The data visualization(s) may be presented to highlight aspects of this in-world
issue.
187
189. WHY USER INTERACTIVITY?
• Data visualizations are not just static and flat files.
• Many enable various types of interactions:
• adjusting parameters of a model (such as data inputs and outputs);
• engaging time (speeding it up, slowing it down, stopping it);
• zooming in and out to disambiguate data, interrelationships, and other dimensions,
and
• accessing underlying data.
• Interactions with data visualizations may enable easier learning (with lower
cognitive loads) and the creation of insights.
189
190. DATA INPUTS AND OUTPUTS
• There are a number of data visualizations (built on NetLogo,Wolfram Language,
and others) that enable users to change up the parameters of the data
visualizations (including data) in order to see what will happen.
• Such data visualizations are focused on system effects of different parameters.
• Often, inputs may be emplaced with slider bars or forms.
• In some cases, it is important to design these with natural data limits (so as not to
enable going beyond reality). In other cases, such interactive data visualizations are
able to be informed by imaginary data ranges and others.
• Some of these data visualizations enable predictivity into the unknown and into the
future. (Agent-based models can be played out into imagination realms by enabling
hundreds of thousands of iterations or more, to see how systems change over time
given theoretical parameters.)
190
191. ENGAGING TIME
• Some data visualizations enable viewers to engage time…to start at particular
points of the 4D visualization, to pause, to restart, and so on. Data
visualizations may sometimes be slowed or speeded up.
• The phenomenon in such data visualizations include those that illuminate
systems and system effects.
191
192. ZOOMING IN- AND OUT-
TO DISAMBIGUATE
• Some data visualizations may be sufficiently complex that objects in data
visualizations may be occluded. To disambiguate complex data visualizations,
such as word networks or 3D cluster diagrams, many enable zooming in and
out to disambiguate the data.
• Many of these also enable the moving around of nodes and links in order to
enable clear visibility.
• Some enable zooming in to particular relationships and specific dynamics in the
data.
192
193. ACCESS TO UNDERLYING DATA
• Another type of interactivity with data visualizations involves viewers accessing
the underlying data behind the data visualization.
• For example, a text set which has been coded for sentiment may be explored
by clicking on a bar on a bar chart, to access the coded data under that
particular level of sentiment. Or a node representing an interview subject may
be clicked to access the underlying transcript.
• This type of interactivity enables the individual to explore the related data
more deeply.
193
195. HUMAN DECISION-MAKING
• Data and data visualizations provide information about in-world phenomena
and in-world potentials.
• There are computational methods that enable the surfacing of latent patterns from
data that would be invisible otherwise.
• Data visualizations make latent insights visible and human-perceivable.
• Data dashboards often provide live and real-time data for awareness, decision-making,
and actions.
195
196. HUMAN DECISION-MAKING (CONT.)
• Ultimately, it is the data behind the data visualization that should inform the
decision-making.
• For data to reflect the world, it has to be properly collected.
• The targeted data have to provide “signal” (indicators of phenomena-of-interest) vs.
“noise” (non-informative static).
• It’s rare that one data visualization or even a sequence or a set will be
sufficiently informative or compelling to sway an important decision, but data
visualizations may be powerful depending on how they are created and
harnessed.
196
197. HUMAN DECISION-MAKING (CONT.)
• It is rare for all sources to point in one direction.
• If all data sources do sing from the same hymnbook, then it may be that the decision-
makers should have a broader data diet that allows a wider range of informational
sources to be accessed for varying perspectives.
197
199. “BIG(GISH) DATA”…
• Debated definition of “big(gish) data”:
• Millions of lines of records and numerous
columns of attribute values
• N = all (and “all” = everything available and in
whatever forms?)
• Structured (datasets and data tables) and semi-
structured / unstructured data (text, imagery,
audio, video, and others)
• Data may be dynamic (vs. static) and
analyzed in transit
• All the usual suspects in terms of data
visualizations plus
• Word clouds
• Cluster diagrams
• Network diagrams
• Mixed item data visualizations
• Dynamic data usually represented on data
dashboards, data crawls, and other fast-
changing formats
THE DATA VISUALIZATIONS
199
“BIG DATA” AND DATA VISUALIZATIONS
201. THE UNDERLYING DATA
AND DATA VISUALIZATIONS
• Data are raw, information is selective and processed, and data visualizations are
selective image-based summaries of data and information.
• This data may be descriptive, inferential, deductive, inductive, analytical, conceptual,
predictive, or some mix of the prior.
• Data visualizations may be sourced from a variety of data—some of it empirically
obtained and others from the human imagination
• Understanding the origins of the data and how it was harvested, created, processed,
handled, and represented is important to understanding data visualizations.
201
202. VARIOUS TYPES OF DATA
VISUALIZATIONS
• Historically, structured and semi-structured data have particular ways that they
are explored and visually expressed.
• Data visualizations have conventions that they must follow based on prior practice
and common understandings.
• Data visualizations may be in 2D, 3D, and 4D, as well as other dimensions.
• Data visualizations are not word-free zones.
• The words used as labels and descriptors have to be precise and align with the data
representations from the underlying dataset and the data visualization elements
themselves. Language matters.
202
203. VARIOUS TYPES OF DATA
VISUALIZATIONS (CONT.)
• Data visualizations in online learning may be sequenced, contextualized, and
made-interactive to enhance learning.
• Data visualizations may be manually created, machine-drawn from data, or
some combination of the prior.
• Data visualizations—both static and dynamic—may be used to inform and
enhance human decision-making.
203
204. EFFECTIVENESS
• To be effective, data visualizations have to
• represent the underlying data accurately
• highlight relevant aspects of the data
• employ proper design
• follow basic data visualization conventions for the data and form, among others
• To align with the cognitive theory of multimedia learning, data visualizations
• should enhance learner perception and learning by employing strategies to lighten
cognitive load
204
205. STAND-ALONE
DATA VISUALIZATION CAPABILITIES
• Data visualizations are usually used in a learning or other context, but they are
often separated from their original contexts and must be able to be understood
even as a standalone chart, table, or figure.
• A data visualization, as a stand-alone, should not lead to misunderstandings (or
negative learning).
• Also, a stand-alone data visualization should be sufficiently professional-looking
because of “optics” and public reputations.
• With reverse image searches, if the original data visualization was found and
mapped by Web crawlers or spiders, it is possible that users may find their way
back to the original context of the data visualization’s usage (unless this content is
behind an authentication layer).
205
206. STAYING LEGAL
• Data visualizations should be based on solid research practices.
• All sources should be cited and given credit.
• Data should not be handled in any misleading way.
• Data visualizations should be created in legal ways. Relevant laws include
intellectual property, privacy protections, accessibility, and others.
206
207. HOW TO GET BETTER GOOD
• Train your eyes to see. Go looking for a range of data visualizations.
• Humans have to train into being more precise than normal. In normal states, people
tend to be pretty sloppy.
• Go elbow deep in data in all forms.
• Work on visualizing that data using different data visualizations and noting the
strengths and limits of each of the visualizations.
• Any changes to the underlying data mean updates to visualizations. Develop a sense
of when each data processing step is actually pseudo-complete before moving to the
next step (to avoid “make-work”).
207
208. HOW TO GET BETTER GOOD (CONT.)
• Avoid getting caught up in the dazzle of data visualizations.
• Do not leave unexploited raw data because the focus is on the visualizations.
• Read about how to create effective data visualizations.
• Get familiar with a range of software and analog tools for data visualization.
Put these into practice.
• Experiment broadly.
208
209. HOW TO GET BETTER GOOD (CONT.)
• Take on a range of masters (publishers, clients, supervisors, students, and
others) with different data visualization needs, and work hard to meet their
needs. Invite healthy and constructive critique, in order to continue to
improve.
• Communicate to the broader public(s) with data visualizations.
• Learn from how the users of data visualizations use them and what they say. If there
are repeated themes in responses, that may be something to pay attention to.
Sometimes, anomalous responses may spark insights.
• Give yourself time to improve (be patient) but not too much time (don’t be
lazy).
• Keep on working at getting better (than wherever you’re at), and aim to get good.
209
210. DATA VISUALIZATION “SIGNATURES”
ContributingVariables to
“Signatures”
% Influence on Signatures
Access to Data 20
Analyst Name Recognition 10
Applied Technologies 10
Data Handling Methods 10
Domain and Content Area(s) 10
Look and Feel 20
Research Impact 20
210
213. DESIRED TECHNOLOGICAL FEATURES
OF DATA VISUALIZATIONS
• Data visualizations should be designed optimally for the following technological
features:
• accessibility
• human readability
• usability across platforms and devices
• machine readability
• preservation (future-proofing across time)
213
214. ABOUT SOFTWARE TOOLS
• Software tools have to be used appropriately for accurate data visualizations.
• Software tools have differing strengths for data visualizations, so it helps to
know what the respective capabilities are and in what sequence and method
these may be applied for different effects.
• Predictive analytics tools have tests of models…
• Manual drawing tools have grids, guidelines, templates, pre-made shapes, and pullout
capabilities.
• Different software tools may be used for different capabilities in various
sequences. It is rare to just use one tool during the entire sequence of data
cleaning, visualization, polish, and finalization.
214
215. SOFTWARE USED IN THIS SLIDESHOW
• These data visualizations were created with various types of concepts and data,
data sources, seeding terms, data parameters, and software.
• The software used for data visualizations include the following (in alphabetical
order): Google Books NgramViewer, IBM’s SPSS Statistics, LIWC2015, Microsoft
Excel 2016, MSVisio, NetLogo, NodeXL template add-on to Excel (or Network
Overview, Discovery and Exploration for Excel by Microsoft and available on MS’s
CodePlex, which will be decommissioned by Dec. 2017, and will be available off
GitHub), NVivo 11 Plus (QSR International), Qualtrics, RapidMiner Studio,Tableau
Public, Streamgraph Add-on to MS Excel 2016 (Microsoft Research), and Google
Correlate. Backup software for digital data visualization processing include Gadwin
PrintScreen and Adobe Photoshop.
• Note: The presenter has no professional ties to any of the software makers
mentioned here.
215
216. CONTACT AND CONCLUSION
• Dr. Shalin Hai-Jew
• Instructional Designer
• iTAC
• Kansas State University
• 212 Hale / Farrell Library
• shalin@k-state.edu
• 785-532-5262
• Data Sources: In the few cases where
outside open data was used, the sources are
cited. Otherwise, all other data were
collected by the author, and the
visualizations were self-generated. General
research sources are cited via links.
• Thanks! I am grateful to the organizers of
the 4th Annual Big 12 Teaching & Learning
Conference at TexasTech University for
including this presentation in their lineup.
• Caveat: This presenter is working at
getting better at data visualizations and is a
long ways from “good” yet.
216