A wide range of contemporary research uses online surveys. This presentation provides an overview of ways to exploit survey-captured data for analysis. There will be a summary of basic survey and item analysis that may be achieved with survey data results. There will also be a range of tips for extracting, cleaning, structuring, and presenting both quantitative and qualitative data for data-consumer sense-making. The platform that will be used as an exemplar will be the Qualtrics survey platform, and two supporting tools used for analysis are Excel 2013 and NVivo 10. Real-world projects are used to demo these approaches—with principal investigator (PI) permission.
What is NodeXL (Network Overview, Discovery and Exploration for Excel)?
Graph aesthetics in NodeXL
Visual pleasure
Cognitive pleasure
Bridging to NodeXL for research and analysis
LIWC-ing at Texts for Insights from Linguistic PatternsShalin Hai-Jew
Since the mid-1990s, researchers have been using the Linguistic Inquiry and Word Count (LIWC pronounced “luke”) software tool to explore various text corpora for hidden insights from linguistic patterns. The LIWC tool has evolved over the years. Simultaneously, research using computational text analysis has evolved and shed light on areas of deception, threat assessment, personality, predictive analytics, and other areas. This presentation will highlight some of the applications of LIWC in the research literature and showcase the tool on some original text sets.
Designing Online Learning to Actual Human CapabilitiesShalin Hai-Jew
In instructional design work, instructional designers (IDs) often focus on the changing technological capabilities (of authoring tools, of learning management systems, and so on)—namely, on enablements / affordances and constraints. What is less often discussed are human capabilities, their affordances and constraints. Human enablements may be broadly conceptualized as the following: (1) perception (five senses and proprioception), (2) cognition, (3) learning, (4) memory, (5) decision-making, and (6) action-taking. This presentation summarizes some of the latest research on these areas of human capabilities and some design mitigations to design for these particular aspects of people.
Capitalizing on Machine Reading to Engage Bigger DataShalin Hai-Jew
What are some ways to select, say, 200 research articles to “close read” from a set of 2,000 PDF articles gleaned from library databases and Google Scholar? How can a researcher make sense of a trending issue in the flood of Tweets and RT based on a particular hashtag (#) or keyword search or an especially lively Tweetstream based on a particular social media account? People are dealing with ever more prodigious amounts of information—from a number of sources. Those who are savvy to the uses of computers to aid their reading (through “distant reading” or “not-reading”) may find that they are able to cover much more ground. This presentation introduces the use of NVivo 11 Plus (matrix queries, word frequency counts, text searches and dendrograms, cluster analyses, topic modeling, and others) for multiple cases of distant reading to aid in academic and research work.
See Ya! Creating a Custom Spatial-Based Linguistic Analysis Dictionary from ...Shalin Hai-Jew
American Renunciation of Citizenship (by the numbers)
LIWC2015 and Custom Dictionaries
Tapping Twitter, Facebook, Flickr, Wikipedia, and Reddit
The “See Ya!” Dictionary
Lessons about Custom Spatial-Based Dictionary-Making
Space, Place, and the Renunciation of U.S. Citizenship (from social media datasets)
Some Future Research Directions
"Mass Surveillance" through Distant ReadingShalin Hai-Jew
Distant reading refers to the uses of computers to “read” texts by counting words, identifying themes and subthemes (through topic modeling), extracting sentiment, applying psychological analysis to the author(s), and otherwise finding latent or hidden insights. This work is based on research on “mass surveillance” based on five text sets: academic, mainstream journalism, microblogging, Wikipedia articles, and leaked government data. The purpose was to capture some insights about the collective social discussions occurring around this issue in an indirect way. This presentation uses a variety of data visualizations (article network graphs, word trees, dendrograms, treemaps, cluster diagrams, line graphs, bar charts, pie charts, and others) to show how machines read and the types of summary data they enable (at computational speeds, at machine scale, and in a reproducible way). Also, some computational linguistic analysis tools enable the creation of custom dictionaries for unique types of applied research. The tools used in this presentation include NVivo 11 Plus and LIWC2015.
Formations & Deformations of Social Network GraphsShalin Hai-Jew
Social network graphs are node-link (vertex-edge; entity-relationship) diagrams that show relationships between people and groups. Open-source tools like NodeXL Basic (available on Microsoft’s CodePlex) enable the capture of network data from select social media platforms through third-party add-ons and social media APIs. From social groups, relational clusters are extracted with clustering algorithms which identify intensities of connections. Visually, structural relational data is conveyed with layout algorithms in two-dimensional space. Using these various layout options and built-in visual design features, it is possible to aesthetically “deform” the network graph data for visual effects. This presentation introduces novel datasets and novel data visualizations.
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...Shalin Hai-Jew
This presentation focuses on how to understand public sentiment through a related-tags content network analysis of public Flickr photos and videos. NodeXL is used to conduct data extractions and visualizations of user-tagged Flickr contents and the resulting “noisy” folksonomies. What mental connections may be made about particular issues based on analysis of text-annotated graphs?
What is NodeXL (Network Overview, Discovery and Exploration for Excel)?
Graph aesthetics in NodeXL
Visual pleasure
Cognitive pleasure
Bridging to NodeXL for research and analysis
LIWC-ing at Texts for Insights from Linguistic PatternsShalin Hai-Jew
Since the mid-1990s, researchers have been using the Linguistic Inquiry and Word Count (LIWC pronounced “luke”) software tool to explore various text corpora for hidden insights from linguistic patterns. The LIWC tool has evolved over the years. Simultaneously, research using computational text analysis has evolved and shed light on areas of deception, threat assessment, personality, predictive analytics, and other areas. This presentation will highlight some of the applications of LIWC in the research literature and showcase the tool on some original text sets.
Designing Online Learning to Actual Human CapabilitiesShalin Hai-Jew
In instructional design work, instructional designers (IDs) often focus on the changing technological capabilities (of authoring tools, of learning management systems, and so on)—namely, on enablements / affordances and constraints. What is less often discussed are human capabilities, their affordances and constraints. Human enablements may be broadly conceptualized as the following: (1) perception (five senses and proprioception), (2) cognition, (3) learning, (4) memory, (5) decision-making, and (6) action-taking. This presentation summarizes some of the latest research on these areas of human capabilities and some design mitigations to design for these particular aspects of people.
Capitalizing on Machine Reading to Engage Bigger DataShalin Hai-Jew
What are some ways to select, say, 200 research articles to “close read” from a set of 2,000 PDF articles gleaned from library databases and Google Scholar? How can a researcher make sense of a trending issue in the flood of Tweets and RT based on a particular hashtag (#) or keyword search or an especially lively Tweetstream based on a particular social media account? People are dealing with ever more prodigious amounts of information—from a number of sources. Those who are savvy to the uses of computers to aid their reading (through “distant reading” or “not-reading”) may find that they are able to cover much more ground. This presentation introduces the use of NVivo 11 Plus (matrix queries, word frequency counts, text searches and dendrograms, cluster analyses, topic modeling, and others) for multiple cases of distant reading to aid in academic and research work.
See Ya! Creating a Custom Spatial-Based Linguistic Analysis Dictionary from ...Shalin Hai-Jew
American Renunciation of Citizenship (by the numbers)
LIWC2015 and Custom Dictionaries
Tapping Twitter, Facebook, Flickr, Wikipedia, and Reddit
The “See Ya!” Dictionary
Lessons about Custom Spatial-Based Dictionary-Making
Space, Place, and the Renunciation of U.S. Citizenship (from social media datasets)
Some Future Research Directions
"Mass Surveillance" through Distant ReadingShalin Hai-Jew
Distant reading refers to the uses of computers to “read” texts by counting words, identifying themes and subthemes (through topic modeling), extracting sentiment, applying psychological analysis to the author(s), and otherwise finding latent or hidden insights. This work is based on research on “mass surveillance” based on five text sets: academic, mainstream journalism, microblogging, Wikipedia articles, and leaked government data. The purpose was to capture some insights about the collective social discussions occurring around this issue in an indirect way. This presentation uses a variety of data visualizations (article network graphs, word trees, dendrograms, treemaps, cluster diagrams, line graphs, bar charts, pie charts, and others) to show how machines read and the types of summary data they enable (at computational speeds, at machine scale, and in a reproducible way). Also, some computational linguistic analysis tools enable the creation of custom dictionaries for unique types of applied research. The tools used in this presentation include NVivo 11 Plus and LIWC2015.
Formations & Deformations of Social Network GraphsShalin Hai-Jew
Social network graphs are node-link (vertex-edge; entity-relationship) diagrams that show relationships between people and groups. Open-source tools like NodeXL Basic (available on Microsoft’s CodePlex) enable the capture of network data from select social media platforms through third-party add-ons and social media APIs. From social groups, relational clusters are extracted with clustering algorithms which identify intensities of connections. Visually, structural relational data is conveyed with layout algorithms in two-dimensional space. Using these various layout options and built-in visual design features, it is possible to aesthetically “deform” the network graph data for visual effects. This presentation introduces novel datasets and novel data visualizations.
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...Shalin Hai-Jew
This presentation focuses on how to understand public sentiment through a related-tags content network analysis of public Flickr photos and videos. NodeXL is used to conduct data extractions and visualizations of user-tagged Flickr contents and the resulting “noisy” folksonomies. What mental connections may be made about particular issues based on analysis of text-annotated graphs?
Using Qualtrics to Create Automated Online TrainingsShalin Hai-Jew
When thinking about “transformational teaching and learning,” training would not be the first thing to come to mind.
The Qualtrics® research suite offers a number of design tools and features that enable the building of automated online trainings. There are the baseline features such as the ability to integrate multimedia, apply various question designs, enable accessibility features (like alt-texting), deliver a mobile experience, reach learners across distances, and provide basic security and data integrity features.
Other features actually make this tool phenomenally powerful. One is the ability to richly customize learning sequences—by learner profile, by performance (behavior), by selection, or a mix of factors. There is a feature that enables the scoring of learner responses and the ability to set a threshold for passing. This tool has a rich data analytics capability (including a light item analysis), including online analytics and even cross-tabulation analysis. A Qualtrics® API enables the recording of online assessment scores and learner behaviors, in an automated way to faculty / staff / student information systems.
Trainings are critical for effective workplace functioning and professional development. The same features in Qualtrics® that enable the effective building of automated trainings also enable the effective building of pre-learning modules or sequences for learners who need to refresh their skills for a new course. This digital slideshow introduces the use of Qualtrics® as a customizable training and pre-learning module tool.
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...Shalin Hai-Jew
This introduces methods for extracting and analyzing social network data from Twitter for hashtag conversations (and emergent events), event graphs, search networks, and user ego neighborhoods (using NodeXL). There will be direct demonstrations and discussions of how to analyze social network graphs. This information may be extended with human- and / or machine-based sentiment analysis.
Writing and Publishing about Applied Technologies in Tech Journals and BooksShalin Hai-Jew
This slideshow provides insights on how to write and publish about applied technologies in tech journals and books, including the following:
Getting started in tech publishing
Cost-benefit calculations
Parts to an article; parts to a chapter
Writing process
Collaborating
Publishing process
Acquiring readers (and citations)
Post-publishing
Next works
Building Surveys in Qualtrics for Efficient AnalyticsShalin Hai-Jew
Qualtrics® is a state-of-the-art online research suite which enables sophisticated data collection and analytics. This presentation will describe how to build a survey for efficient analytics, both within Qualtrics® and outside Qualtrics®. This presentation emphasizes the importance of thinking through the data collection, the analytics, and the data presentation, in order to build a survey instrument that works for the research context. Along the way, some of the cutting-edge survey-building capabilities of Qualtrics® (including rich question types, invisible questions, branching logic, display logic, panel triggers, and others), will be showcased along with the data analytics functionalities (including cross-tab analysis and data visualizations).
Researchers have long known that the words of a text have always contained more information than on the surface. As such, texts have been studied for subtexts and other latent or hidden information. One approach has involved the machine-enabled analysis of human sentiment, usually mapped out on a positive-negative polarity. NVivo 11 Plus (a qualitative research tool released in late 2015) enables the automated sentiment analysis of texts (coded research, formal articles, text corpora, Tweetstream datasets, Facebook wall posts, websites, and other sources) based on four categories: very positive, moderately positive, moderately negative, and very negative. The tool feature compares the target text set against a sentiment dictionary and enables coding at different units of analysis: sentence, paragraph, or cell. Further, the sentiment capability extracts the coded text into respective text sets which may be further analyzed using text frequency counts, text searches, automated theme and sub-theme extractions (topic modeling), and data visualizations.
Building a Digital Learning Object w/ Articulate Storyline 2Shalin Hai-Jew
The digital learning object (DLO) is still a common staple in online learning. One of the more sophisticated authoring tools to build DLOs is Articulate Storyline 2, which enables the integration of multimedia (including screen captures with Articulate Replay); the building of animations; branching, and other features. Its packaging allows a full range of SCORM and Tin Can API outputs and versioning in HTML 5. This presentation will introduce the software tool and some of its capabilities to provide a sense of where digital learning objects may be headed.
Eavesdropping on the Twitter Microblogging SiteShalin Hai-Jew
Research analysts go to Twitter to capture the general trends of public conversations, identify and profile influential accounts, and extract subgroups within larger collectives and larger discourses; they also go to eavesdrop on individual self-talk and individual-to-individual conversations. So what is technically in your tweets, asked Dave Rosenberg famously in a CNET article (2010). The answer: a whole lot more than 140 characters. How are the most influential social media accounts identified through #hashtag graphs? How are themes extracted? How are sentiments understood? How can users be profiled through their Tweetstreams? How can locations be mapped in terms of the Twitter conversations occurring in particular physical areas? How can live and trending issues be identified and categorized in terms of sentiment (positive, negative, and neutral)? This presentation will summarize some of the free and open-source tools as well as commercial and proprietary ones that enable increased knowability.
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10Shalin Hai-Jew
An experimental feature in NVivo 10 (circa 2013), Autocoding by Existing Pattern, enables the application of semi-supervised machine learning to ingested research data. This results in the extraction of themes and other relevant insights from data—at machine speeds, based on the classification algorithm. This presentation will introduce this feature in NVivo 10 (on both Windows and Mac platforms). This will show how the machine can achieve high inter-rater reliability (a Cohen’s Kappa of one in many cases) on the one hand but still not achieve full human sensibility from “close reading” coding on the other. This presentation will suggest a complementary balance between machine- and human- coding of qualitative and mixed methods data for the most efficient application of researcher time and expertise.
Native Emigration from the U.S. and Renunciation of U.S. Citizenship Shalin Hai-Jew
This presentation summarizes some initial research on the phenomena of the renunciation of U.S. citizenship and green card status. This presentation highlights some of the basic literature and then uses some social media to tap an indirect sense of public attitudes towards this and peripherally related issues.
Exploring Article Networks on Wikipedia with NodeXLShalin Hai-Jew
With 4.7 million articles in the English version of Wikipedia, this crowd-sourced online encyclopedia is regularly one of the top-ten visited sites online. For many, this is the go-to source for a first read on a topic. The open-source and free Network Overview, Discovery and Exploration for Excel (NodeXL), which is an add-on to Microsoft Excel, enables the capture of “article networks” from Wikipedia. Such content network analysis-based data visualizations enable the development of research leads; some understandings of public conceptualizations of related concepts, peoples, events, and phenomena; the profiling of Wikipedia editors (both humans and ‘bots), and other research insights. This presentation will showcase this affordance of NodeXL and provide some ideas for practical applications of this channel of research and knowing.
This slideshow reviews some of the features and functionalities of Qualtrics that enable its use in online trainings. This explores some important instructional design elements in online trainings, including for three main types: policy compliance, mass-scale trainings, and customized trainings. This reviews some core elements of online trainings. Finally, there are some reflections on real-world considerations when building an online training on Qualtrics.
Using Qualtrics to Create Automated Online TrainingsShalin Hai-Jew
When thinking about “transformational teaching and learning,” training would not be the first thing to come to mind.
The Qualtrics® research suite offers a number of design tools and features that enable the building of automated online trainings. There are the baseline features such as the ability to integrate multimedia, apply various question designs, enable accessibility features (like alt-texting), deliver a mobile experience, reach learners across distances, and provide basic security and data integrity features.
Other features actually make this tool phenomenally powerful. One is the ability to richly customize learning sequences—by learner profile, by performance (behavior), by selection, or a mix of factors. There is a feature that enables the scoring of learner responses and the ability to set a threshold for passing. This tool has a rich data analytics capability (including a light item analysis), including online analytics and even cross-tabulation analysis. A Qualtrics® API enables the recording of online assessment scores and learner behaviors, in an automated way to faculty / staff / student information systems.
Trainings are critical for effective workplace functioning and professional development. The same features in Qualtrics® that enable the effective building of automated trainings also enable the effective building of pre-learning modules or sequences for learners who need to refresh their skills for a new course. This digital slideshow introduces the use of Qualtrics® as a customizable training and pre-learning module tool.
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...Shalin Hai-Jew
This introduces methods for extracting and analyzing social network data from Twitter for hashtag conversations (and emergent events), event graphs, search networks, and user ego neighborhoods (using NodeXL). There will be direct demonstrations and discussions of how to analyze social network graphs. This information may be extended with human- and / or machine-based sentiment analysis.
Writing and Publishing about Applied Technologies in Tech Journals and BooksShalin Hai-Jew
This slideshow provides insights on how to write and publish about applied technologies in tech journals and books, including the following:
Getting started in tech publishing
Cost-benefit calculations
Parts to an article; parts to a chapter
Writing process
Collaborating
Publishing process
Acquiring readers (and citations)
Post-publishing
Next works
Building Surveys in Qualtrics for Efficient AnalyticsShalin Hai-Jew
Qualtrics® is a state-of-the-art online research suite which enables sophisticated data collection and analytics. This presentation will describe how to build a survey for efficient analytics, both within Qualtrics® and outside Qualtrics®. This presentation emphasizes the importance of thinking through the data collection, the analytics, and the data presentation, in order to build a survey instrument that works for the research context. Along the way, some of the cutting-edge survey-building capabilities of Qualtrics® (including rich question types, invisible questions, branching logic, display logic, panel triggers, and others), will be showcased along with the data analytics functionalities (including cross-tab analysis and data visualizations).
Researchers have long known that the words of a text have always contained more information than on the surface. As such, texts have been studied for subtexts and other latent or hidden information. One approach has involved the machine-enabled analysis of human sentiment, usually mapped out on a positive-negative polarity. NVivo 11 Plus (a qualitative research tool released in late 2015) enables the automated sentiment analysis of texts (coded research, formal articles, text corpora, Tweetstream datasets, Facebook wall posts, websites, and other sources) based on four categories: very positive, moderately positive, moderately negative, and very negative. The tool feature compares the target text set against a sentiment dictionary and enables coding at different units of analysis: sentence, paragraph, or cell. Further, the sentiment capability extracts the coded text into respective text sets which may be further analyzed using text frequency counts, text searches, automated theme and sub-theme extractions (topic modeling), and data visualizations.
Building a Digital Learning Object w/ Articulate Storyline 2Shalin Hai-Jew
The digital learning object (DLO) is still a common staple in online learning. One of the more sophisticated authoring tools to build DLOs is Articulate Storyline 2, which enables the integration of multimedia (including screen captures with Articulate Replay); the building of animations; branching, and other features. Its packaging allows a full range of SCORM and Tin Can API outputs and versioning in HTML 5. This presentation will introduce the software tool and some of its capabilities to provide a sense of where digital learning objects may be headed.
Eavesdropping on the Twitter Microblogging SiteShalin Hai-Jew
Research analysts go to Twitter to capture the general trends of public conversations, identify and profile influential accounts, and extract subgroups within larger collectives and larger discourses; they also go to eavesdrop on individual self-talk and individual-to-individual conversations. So what is technically in your tweets, asked Dave Rosenberg famously in a CNET article (2010). The answer: a whole lot more than 140 characters. How are the most influential social media accounts identified through #hashtag graphs? How are themes extracted? How are sentiments understood? How can users be profiled through their Tweetstreams? How can locations be mapped in terms of the Twitter conversations occurring in particular physical areas? How can live and trending issues be identified and categorized in terms of sentiment (positive, negative, and neutral)? This presentation will summarize some of the free and open-source tools as well as commercial and proprietary ones that enable increased knowability.
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10Shalin Hai-Jew
An experimental feature in NVivo 10 (circa 2013), Autocoding by Existing Pattern, enables the application of semi-supervised machine learning to ingested research data. This results in the extraction of themes and other relevant insights from data—at machine speeds, based on the classification algorithm. This presentation will introduce this feature in NVivo 10 (on both Windows and Mac platforms). This will show how the machine can achieve high inter-rater reliability (a Cohen’s Kappa of one in many cases) on the one hand but still not achieve full human sensibility from “close reading” coding on the other. This presentation will suggest a complementary balance between machine- and human- coding of qualitative and mixed methods data for the most efficient application of researcher time and expertise.
Native Emigration from the U.S. and Renunciation of U.S. Citizenship Shalin Hai-Jew
This presentation summarizes some initial research on the phenomena of the renunciation of U.S. citizenship and green card status. This presentation highlights some of the basic literature and then uses some social media to tap an indirect sense of public attitudes towards this and peripherally related issues.
Exploring Article Networks on Wikipedia with NodeXLShalin Hai-Jew
With 4.7 million articles in the English version of Wikipedia, this crowd-sourced online encyclopedia is regularly one of the top-ten visited sites online. For many, this is the go-to source for a first read on a topic. The open-source and free Network Overview, Discovery and Exploration for Excel (NodeXL), which is an add-on to Microsoft Excel, enables the capture of “article networks” from Wikipedia. Such content network analysis-based data visualizations enable the development of research leads; some understandings of public conceptualizations of related concepts, peoples, events, and phenomena; the profiling of Wikipedia editors (both humans and ‘bots), and other research insights. This presentation will showcase this affordance of NodeXL and provide some ideas for practical applications of this channel of research and knowing.
This slideshow reviews some of the features and functionalities of Qualtrics that enable its use in online trainings. This explores some important instructional design elements in online trainings, including for three main types: policy compliance, mass-scale trainings, and customized trainings. This reviews some core elements of online trainings. Finally, there are some reflections on real-world considerations when building an online training on Qualtrics.
Agile bringing Big Data & Analytics closerNitin Khattar
In todays modern world, the data has turned out to be the NUCLEUS of the Quantum Mechanics, Photon of the Light or as we say the core of every single invention/innovation. Whether it is the data generated out of Financial organizations, Stock Markets, Social Media or whether it is the eating habits, likes & dislikes of an individual. Whatever we do every day results in loads of useful data being generated
But, without a meaningful judgement, without giving labels, without attaching semantics to this data, this is nothing more than a big black hole. Here comes the role of Analytics, which helps giving Data its actual identity.
It is important for every organization to bridge this gap between Data & Analytics and help them come closer & work hand in hand. Here comes Agile as the solution to this problem
Spatio-‐temporal Sensor Integration, Analysis, Classification or Can Exascal...Joel Saltz
Presentation at Clusters, Clouds and Data for Scientific Computing 2014
Integrative analyses of large scale spatio-temporal datasets play increasingly important roles in many areas of science and engineering. Our recent work in this area is motivated by application scenarios involving complementary digital microscopy, radiology and “omic” analyses in cancer research. In these scenarios, the objective is to use a coordinated set of image analysis, feature extraction and machine learning methods to predict disease progression and to aid in targeting new therapies. I will describe tools and methods our group has developed for extraction, management, and analysis of features along with the systems software methods for optimizing execution on high end CPU/GPU platforms. Once having provided our current work as an introduction, I will then describe 1) related but much more ambitious exascale biomedical and non-biomedical use cases that also involve the complex interplay between multi-scale structure and molecular mechanism and 2) concepts and requirements for methods and tools that address these challenges.
Presentation given by Dr Xin-Yi Chua at the 'Sharing Health-y Data Workshop: Challenges and Solutions' event co-hosted by ANDS and HISA. Held on Wednesday 16th March 2016 at the Translational Research Institute, Brisbane, Australia.
An emerging step : Data Warehousing to Pattern WarehousingHarshita S. Jain
This presentation shows the sequential step from the advent of Data mining, Data Warehousing to Pattern Warehousing which includes the present gaps and gives idea for future work & research in order to make the work more easy.
RESEARCH PROCESS
SELECTION OF RESEARCH PROBLEM
REVIEW LITERATURE
MAKING HYPOTHESIS
PREPARING THE RESEARCH DESIGN
SAMPLING
DATA COLLECTION
DATA ANALYSIS
HYPOTHESIS TESTING
GENERALIZATION AND INTERPRETATION
CONCLUSION
PREPARATION OF REPORT
Usability is a measure of effectiveness. It describes how effective tools and information sources are in helping us accomplish tasks. The more usable the tool, the better we are able to achieve our goals. Many tools help us overcome physical limitations by making us stronger, faster, and more sharp-sighted. But tools can be frustrating or even disabling. When we encounter a tool that we cannot work with, either because it is poorly designed or because its design does not take into account our needs, we are limited in what we can accomplish.
Reference:
Web Style Guide: Basic Design Principles for Creating Web Sites, by Patrick J. Lynch and Sarah Horton: http://webstyleguide.com/index.html
The Jeopardy match between the two best human players of all time and the IBM Deep Q/A software, “Watson,” captured the spotlight and stimulated the imagination of the entire world. The subsequent announcement of IBM’s involvement in the creation of “Dr. Watson” has created a high level of interest in the healthcare community about the potential of this breakthrough technology as well as the potential pitfalls of the use of “artificial intelligence” in medicine. Dr. Siegel is currently working together with IBM engineers to explore how Dr. Watson can work together with physicians and medical specialists. His presentation, which was delivered on March 28th, provided a high level overview of the uniqueness of Deep Q/A Software and how it differs from other previous artificial intelligence applications.
Similar to Fully Exploiting Qualitative and Mixed Methods Data from Online Surveys (20)
Long nonfiction chapters are not in-style and may never have been. Where average chapter lengths of nonfiction book chapters are about 4,000 – 7,000 words in length, some may be several times that max range number. The explanation is that there is some irreducible complexity that that chapter addresses that cannot be addressed in shorter form. This slideshow explores some methods for writing longer chapters while still maintaining coherence, focus, and reader interest…and while using some technological tools to write and edit more efficiently.
Overcoming Reluctance to Pursuing Grant Funds in AcademiaShalin Hai-Jew
Starting as an organization’s new grant writer can be a challenge, especially in a case where there has been a time lapse since the last one left. People get out of the habit of pursuing grant funds. This slideshow addresses some of the reasons for such reluctance and proposes some ways to mitigate these.
Writing grants is one common way that those in institutions of higher education may acquire some funds—small and big, one-off and continuing—to conduct research, hire faculty and researchers and learners and others, update equipment, update or build up new buildings, and achieve other work. This slideshow explores some aspects of the work of grant writing in the present moment in higher education.
Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...Shalin Hai-Jew
The SARS-CoV-2 pandemic inspired several years of experimentation with common or folk art, involving mixed media, alcohol ink painting, and other explorations. Then, with the emergence of art-making generative AIs, there were further experiments, particularly with one that enables generation of visuals from scanned art and photos, text prompts, style overlays, and text-based visual modifiers. While both types of artmaking are emotionally satisfying and helpful for stress management, there are some contrasting differences. This exploratory slideshow explores some of these differences in order to partially shed light on the informal usage of an art-making generative AI (artificial intelligence).
Creating Seeding Visuals to Prompt Art-Making Generative AIsShalin Hai-Jew
Art-making generative AIs have come to the fore. A basic work pipeline typically involves starting with text prompts -> generated images. That image may be used to seed further iterations. Deep Dream Generator (DDG) enables the application of “modifiers” of various types (artist styles, visual adjectives, others) to be applied in addition to the text prompt.
Another approach involves beginning with a “seeding image,” a born-digital or digitized (born-analog) visual on which AI-generated art may be based for a multi-channel and multi-modal prompt. This slideshow provides some observations of how to think about seeding images, particularly in terms of how the DDG handles them, with its “algorithmic pareidolia” (“Deep Dream,” Wikipedia, July 3, 2023).
Human art-making is often about throwing mass-scale conversations. Artists are thought to help bridge humanity into the future. Whether generative AI art enables this or not is still not clear.
Common Neophyte Academic Book Manuscript Reviewer MistakesShalin Hai-Jew
The work of academic book reviewing, as a volunteer (most often), is a common academic practice. The presenter has served as a neophyte one for some years before settling into this invited volunteer work for several decades. There have been lessons learned over time about avoidable mistakes…from both experience and observation.
Fashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AIShalin Hai-Jew
CrAIyon (formerly DALL-E after Salvador “Dali”) is a web-facing art-making generative AI tool online (https://www.craiyon.com/) that enables the uses of text (and image) prompts for the creation of watermarked, lightweight visuals. Counterintuitively, the rough visuals are much more usable for recombinations and remixes and recreations into usable digital visuals for various digital learning objects. The textual prompts are not particularly intuitive because of how the generative AI program was trained on mass-scale visuals). There is an art and occasional indirection to working prompts after each try, with the resulting nine-image proof sheets that CrAIyon outputs. The tool can be used iteratively for different outputs.
The tool sometimes turns out serendipitous surprises, including an occasional work so refined that it can be used / shared almost unedited. One challenge in using CrAIyon comes from their request for credit (for all non-subscribers to their service). Another comes from the visual watermarking (orange crayon at the bottom right of the image). However, this tool is quite useful for practical applications if one is willing to engage deep digital image editing (Adobe Photoshop, Adobe Illustrator).
Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...Shalin Hai-Jew
Augmented reality (AR)—the use of digital overlays over physical space—manifests in a wide range of spaces (indoor, outdoor; virtual) and ways (in real space (with unaided human vision); in head gear; in smart glasses; on mobile devices, and others). There are various authoring technologies that enable the making of AR experiences for various users. This work uses a particular tool (Adobe Aero®) to explore ways to build AR for multiple dimensions, including the fourth dimension (motion, changes over time).
Based on the respective purposes of the AR experience, some basic heuristics are captured for
space design (1),
motion design (2),
multiple perception design (sight, smell, taste, sound, touch) (3),
and virtual- and tangible- interactivity (4).
Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...Shalin Hai-Jew
One of the extant questions about augmented reality (AR) is how (in)effective it is for the teaching and learning in various formal, nonformal, and informal contexts. The research literature shows mixed findings, which are often highly context-based (and not generalizable). There are some non-trivial costs to the design/development/deployment of AR for teaching and learning. For the users, there is cognitive load on the working memory [(1) extraneous/poor design, (2) intrinsic/inherent difficulty in topic, and (3) germane/forming schemas]. For teachers, there are additional knowledge, skills, and abilities / attitudes (KSAs) that need to be brought to bear.
Exploring the Deep Dream Generator (an Art-Making Generative AI) Shalin Hai-Jew
The Deep Dream Generator was created by Google engineer Alexander Mordvintsev in 2014. It has a public facing instance at https://deepdreamgenerator.com/, which enables people to use text prompts and image prompts (individually or in combination) to inspire the art-generating generative AI to output images. This work highlights some process-based walk-throughs of the tool, some practical uses, some lightweight art learning, some aspects of the online social community on this platform, and other insights. Some works by the AI prompted by the presenter may be seen here: https://deepdreamgenerator.com/u/sjjalinn.
(This is the first draft of a slideshow that will be used in a conference later in the year.)
Augmented Reality for Learning and AccessibilityShalin Hai-Jew
Recently, the presenter conducted a systematic review of the academic literature and an environmental scan to learn how to set up an augmented reality (AR) shop at an institution of higher education. The ambition was to not only set up AR in an accessible and legal way but also be able to test for potential +/- effects of AR on teaching and learning. The research did not go past the review stage, because of a lack of funding, but some insights about accessibility in AR were acquired.
(The visuals are from Deep Dream Generator and CrAIyon.)
Engaging Pixabay as an open-source contributor to hone digital image editing,...Shalin Hai-Jew
This slideshow describes the author's early experiences with creating two accounts on Pixabay in order to advance digital editing skills in multimedia. The two accounts are located at https://pixabay.com/users/sjjalinn-28605710/ and https://pixabay.com/users/wavegenerics-29440244/ ...
This work explores four main spaces where researchers publish about educational technology: academic-commercial, open-access, open-source, and self-publishing.
Human-Machine Collaboration: Using art-making AI (CrAIyon) as cited work, o...Shalin Hai-Jew
It is early days for generative art AIs. What are some ways to use these to complement one's work while staying legal (legal-ish)?
Correction: .webp is a raster format
Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...Shalin Hai-Jew
University creative shops are exploring whether they can get into the game of producing AR-enhanced experiences: campus tours, interactive gaming, virtual laboratories, exploratory art spaces, simulations, design labs, online / offline / blended teaching and learning modules, and other AR applications.
This work offers a basic environmental scan of the AR space for online teaching and learning, and it includes pedagogical design leads from the current research, technological knowhow, hands-on design / development / deployment of learning objects, and online teaching and learning methods.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
2. PRESENTATION DESCRIPTION
• A WIDE RANGE OF CONTEMPORARY RESEARCH USES ONLINE SURVEYS. THIS PRESENTATION
PROVIDES AN OVERVIEW OF WAYS TO EXPLOIT SURVEY-CAPTURED DATA FOR ANALYSIS.
THERE WILL BE A SUMMARY OF BASIC SURVEY AND ITEM ANALYSIS THAT MAY BE ACHIEVED
WITH SURVEY DATA RESULTS. THERE WILL ALSO BE A RANGE OF TIPS FOR EXTRACTING,
CLEANING, STRUCTURING, AND PRESENTING BOTH QUANTITATIVE AND QUALITATIVE DATA
FOR DATA-CONSUMER SENSE-MAKING. THE PLATFORM THAT WILL BE USED AS AN EXEMPLAR
WILL BE THE QUALTRICS SURVEY PLATFORM, AND TWO SUPPORTING TOOLS USED FOR
ANALYSIS ARE EXCEL 2013 AND NVIVO 10. REAL-WORLD PROJECTS ARE USED TO DEMO
THESE APPROACHES—WITH PRINCIPAL INVESTIGATOR (PI) PERMISSION.
2
3. OVERVIEW
• LIGHT DEFINITION OF “EXPLOIT”
• A REFRESHER REVIEW OF SURVEYS [EXCERPTED FROM “REVIEWING SURVEYS, INTERVIEWS AND
FOCUS GROUPS” (JAN. 30, 2015)]
• (GENERIC) RESEARCH DESIGN; BASIC PURPOSES OF SURVEYS; THE SURVEY INSTRUMENT; SURVEY
RELIABILITY; SURVEY VALIDITY; THE CREDIBILITY OF SURVEY FINDINGS; THE TIME FACTOR; SAMPLING
OF SURVEY RESPONDENTS; ONLINE SURVEYS; DATA FORMS
• QUANTITATIVE / QUALITATIVE / MIXED METHODS / MULTI METHOD /
• SOME PRINCIPLES OF DATA ANALYSIS
• QUALTRICS
• ABOUT THE ONLINE SURVEY SYSTEM
• WITHIN-QUALTRICS DATA ANALYTICS
3
4. OVERVIEW (CONT.)
• THE USES OF OTHER SOFTWARE
• EXCEL
• MICROSOFT WORD
• NOTEPAD
• IBM SPSS
• NVIVO 10
• AUTOMAP AND ORA NETSCENES
• NODEXL, UCINET
• TABLEAU (PUBLIC), ARCMAP / ARCGIS PRO
• RAPIDMINER STUDIO
4
5. OVERVIEW (CONT.)
• THE RESEARCHER INTERPRETIVE LENS; THE PROBLEM OF HUMAN MANIPULATION OF DATA
• SOME REAL-WORLD CASES
• CASE 1: A MULTI-COUNTRY MULTI-GRAIN LONGITUDINAL SURVEY
• CASE 2: INFORMATION TECHNOLOGY (IT) SATISFACTION SURVEY
• FULL EXPLOITATION OF SURVEY DATA
• REALITY CHECKS & CAVEATS
• CONTACT AND CONCLUSION
5
7. “EXPLOIT”
• VERB: MAKE FULL USE OF A RESOURCE; TAKING FULL ADVANTAGE OF A RESOURCE
• NOUN: A DARING FEAT
7
8. WHY EXPLOIT?...NEW INSIGHTS
• DATA POVERTY / DATA RICHNESS
• RECOUPING HIGH EXPENSE IN COLLECTING SOME SURVEY DATA
• GRANT(S); INSTITUTIONAL COSTS; TECHNOLOGIES; PEOPLE TIME; PEOPLE EXPERTISE
• SPACE TO REPURPOSE CAPTURED DATA THROUGH CROSS-REFERENCING AND COMPARING AND
CONTRASTING DATA
• AGAINST PUBLICLY AVAILABLE DATASETS
• AGAINST SOCIAL MEDIA DATA
• AGAINST SPATIAL DATA
• AGAINST PUBLIC INFORMATION (OFTEN PUBLIC MEDIA-BASED)
• AGAINST COMPARABLE CASES
8
9. WHY EXPLOIT?...NEW INSIGHTS (CONT.)
• MACHINE-ENABLEMENTS FOR ANALYZING HETEROGENEOUS DIGITAL DATA (VARIOUS
SOURCES, VARIOUS FORMATS, VARIOUS STRUCTURES, VARIOUS MEDIA)
• AUTOCODING BY EXISTING PATTERN, TEXT NETWORK ANALYSIS, GEOSPATIAL MAPPING, AND
OTHER APPROACHES
• RICH DATA VISUALIZATIONS
• DATA MINING
• MODEL EXTRACTION FROM DATA
• NOT “EXPLOIT” AS TO CAUSE OR ALLOW ANY POTENTIAL HARM TO RESEARCH PARTICIPANTS;
NOT GOING BEYOND THE APPROVED USES OF THE INFORMATION
9
10. A REFRESHER / REVIEW OF SURVEYS
(AS A RESEARCH TOOL)
10
11. (GENERIC) QUAL / MIXED / MULTI METHODS
RESEARCH DESIGN
• CONCEPTUALIZATION: RESEARCH
OBJECTIVES, RESEARCH QUESTIONS,
HYPOTHESES, POTENTIAL IMPLICATIONS
• THOROUGH REVIEW OF THE LITERATURE
(ANNOTATION AND WRITE-UP)
• RESEARCH DESIGN (MIXED
METHODOLOGY / SYNTH, MULTI-
METHOD / SEQUENTIAL)
• INSTRUMENTATION (DESIGN, PILOT-
TESTING, REVISION)
• SAMPLING (SELECTION OF
RESPONDENTS: RANDOM, STRATIFIED
RANDOM, NON-RANDOM,
CONVENIENCE, OTHER)
• RESEARCH
• DATA COLLECTION (SOMETIMES MULTI-
METHOD; MIXED METHOD; FORM OF
DATA COLLECTION AFFECTS ANALYSIS)
11
12. (GENERIC) QUAL / MIXED / MULTI METHODS
RESEARCH DESIGN (CONT.)
• FOLLOW-UP (IF NEEDED)
• DATA VISUALIZATION
• DATA ANALYSIS (QUANTITATIVE AND
QUALITATIVE METHODS)
• “DISCUSSION” SECTION
• REPORTING OUT
• POTENTIAL IMPLICATIONS
• FOLLOW-ON RESEARCH
• FUTURE RESEARCH
12
13. BASIC PURPOSES OF SURVEYS
• COLLECT DATA ABOUT PEOPLE’S EXPERIENCES, SITUATIONS, ATTITUDES, BELIEFS,
OPINIONS, AND OTHER FACTORS AT A PARTICULAR POINT-OF-TIME, OR OVER TIME
• COMPLEMENT VARIOUS OTHER TYPES OF RESEARCH, INCLUDING EXPERIMENTAL
RESEARCH (RANDOM SAMPLING, CONTROL GROUP VS. EXPERIMENTAL GROUP)
• MAY BE USED AT ANY TIME IN THE RESEARCH PROCESS FOR VARYING PURPOSES
• IDENTIFY TRENDS OVER TIME FOR PARTICULAR POPULATIONS (IN PARTICULAR
CONTEXTS)
• USUALLY INVOLVES BOTH QUALITATIVE AND QUANTITATIVE DATA (MIXED-METHODS /
MULTI METHODS) DATA COLLECTION AND ANALYSIS
13
14. THE SURVEY INSTRUMENT
• IS DESIGNED FOR PARTICULAR PURPOSES
• IS WRITTEN IN AN UNDERSTANDABLE WAY (“STANDARD LANGUAGE”); IF IN A FOREIGN
LANGUAGE, ACHIEVED BY A PROFESSIONAL TRANSLATOR OR NATIVE SPEAKER (NOT
MACHINE-TRANSLATION)
• USES CLOSE-ENDED QUESTIONS APPROPRIATELY WITH A FULL RANGE OF CHOICES (NO
FALSE LIMITS)
• IF SCALED RESPONSES, PROPER SCALING (LIKE LIKERT-LIKE SCALES AND CONSISTENCY OF ORDER;
OR FORCED-CHOICE 4-POINT LIKERT SCALES WITH NO FENCE-SITTING NEUTRALITY); IF SCALED
RESPONSES, PROPER CONSISTENCY IN TERMS OF DIRECTION (HIGHEST-TO-LOWEST FOR ALL
QUESTIONS; OR LOWEST-TO-HIGHEST FOR ALL QUESTIONS)
• USES OPEN-ENDED QUESTIONS APPROPRIATELY, WITH SUFFICIENT DIRECTION AND SPACE
FOR A FULL TEXTUAL RESPONSE
• IS ACCESSIBLE FOR ALL THOSE WITH A RANGE OF SPECIAL AND OTHER NEEDS
(TRANSCRIPTIONS AND TIMED TEXT FOR VIDEOS, ALT TEXT FOR IMAGES, ETC.)
14
15. THE SURVEY INSTRUMENT (CONT.)
• ALIGNS THE QUESTIONS WITH THE APPROPRIATE DATA TYPES [CATEGORICAL, ORDINAL
(RANK ORDER), NUMERICAL (DISCRETE, CONTINUOUS), TEXT- / AUDIO- BASED / VIDEO-
BASED , AND OTHERS)
• INCLUDES INFORMED CONSENT AT THE BEGINNING; ENABLES OPT-OUT AT ANY TIME;
NO COLLECTION OF EXCESS INFORMATION; NO DECEPTION (UNLESS APPROVED BY THE
INSTITUTIONAL REVIEW BOARD / IRB)
• IS INFORMED BY THE RESEARCH LITERATURE (EXPLORED TO “SATURATION”)
• IS STRATEGICALLY SEQUENCED
• AVOIDS FORCING RESPONSES BECAUSE OF THE PARTICIPANT OPT-OUT ISSUE PER IRB
GUIDELINES (DEBATABLE) 15
16. THE SURVEY INSTRUMENT (CONT.)
• AVOIDS ANY BIASING DESIGN OR LEADING LANGUAGE
• IS PILOT-TESTED WITH BOTH EXPERTS AND WITH PEOPLE WHO ARE SIMILAR TO
RESPONDENTS, WITH CHANGES MADE TO ENSURE LANGUAGE CLARITY;
COMPREHENSIVENESS OF THE SURVEY; CLEAR TRANSITIONS; ACCESSIBILITY; AND
CORRECTIONS OF ALL KNOWN ERRORS (AND CONTINUING TESTING UNTIL NO OTHER
ERRORS ARE FOUND)
• IS TESTED FOR RELIABILITY (THAT IT IS DEPENDABLE AND CONSISTENT)
• IS TESTED FOR VALIDITY (THAT IT MEASURES WHAT IT PURPORTS TO MEASURE)
• ALIGNS WITH DOMAIN’S PROFESSIONAL RESEARCH STANDARDS AND EXPECTATIONS
• MAY BE VERSIONED FOR DIFFERENT GROUPS, OR MAY BE BRANCHED FOR CERTAIN GROUPS
• MUST MAINTAIN COMPARABILITY IF STUDIED FOR TREND DATA FOR LONGITUDINAL
RESEARCH
16
17. SURVEY RELIABILITY
• ACHIEVING THE SAME RESULTS EVERY TIME THE INSTRUMENT IS USED, SUCH AS
THROUGH TEST-RETEST RELIABILITY WITH THE SAME PERSON OR GROUP (OVER TIME);
CONSISTENCY OF PERFORMANCE
• RELIABILITY ACROSS DIFFERENT INSTRUMENTS OR “EQUIVALENCE RELIABILITY”
17
18. SURVEY RELIABILITY (CONT.)
• INTERNAL CONSISTENCY OF MEASURE [CRONBACH’S ALPHA Α / COEFFICIENT ALPHA; THE
COMPLEMENTARITY OF QUESTIONS IN RELATION TO EACH OTHER IN MEASURING ONE
DIMENSION OR A SINGLE CONSTRUCT (UNIDIMENSIONAL); INTER-CORRELATIONS AMONG
THE TEST ITEMS; NOT ROBUST UNDER CONDITIONS OF MISSING DATA; VARIABLES AND THE
DEGREE TO WHICH THEY MEASURE THE SAME THING IN AN INTER-ITEM CORRELATION WAY AS
EXPRESSED IN A MATRIX AND COMPARISONS DONE BY REMOVING VARIABLES TO SEE WHAT
CHANGES OCCUR IN THE MEASURING OF THE CONSTRUCT; A LATENT CONSTRUCT MAY
AFFECT THE ALPHA; Α < 1 ]
18
19. SURVEY VALIDITY
• ACCURATE MEASUREMENT OF WHAT IT WAS DESIGNED TO MEASURE; VALID TO THE
TASK
• DIFFERENT TYPES OF VALIDITY:
• PREDICTIVE VALIDITY (PROJECTS TO THE FUTURE)
• CONCURRENT VALIDITY (AGAINST AN ACCEPTED MEASURE)
• CONTENT VALIDITY [REASONABLE SAMPLE OF RELATED INFORMATION AND PROPER TERMS
FOR WHAT THE SURVEY WANTS TO SAMPLE(FINK, 2013, P. 67)]
• CONSTRUCT VALIDITY (USING THE INSTRUMENT ON RESPONDENTS WHO’VE BEEN
ESTABLISHED BY EXPERTS TO RATE A PARTICULAR WAY ON A PARTICULAR SCALE ON A
PARTICULAR CONSTRUCT TO SEE IF THE TARGET SURVEY COMES UP WITH THE SAME RESULTS) 19
20. THE CREDIBILITY OF SURVEY FINDINGS
• “RELIABILITY” AND “VALIDITY” ARE DEVELOPED TO SUPPORT THE “MEASUREMENT
VALIDITY” AND ULTIMATELY THE CREDIBILITY OF SURVEY FINDINGS
• ALSO NEED TO CONTROL FOR “ERROR,” WHICH COMES FROM MANY SOURCES:
• REPRESENTATIVE SAMPLING, ELIGIBILITY CRITERIA OF THOSE TAKING THE SURVEY, LOW
RESPONSE RATES, ATTRITION OF PARTICIPANTS (PARTICULARLY IN LONGITUDINAL RESEARCH)
• RESEARCHER EFFECTS: COGNITIVE BIASES, INCENTIVES, WEAKNESSES
• RESEARCH DESIGN
• INSTRUMENT DESIGN
• ADMINISTRATION
• FOLLOW-ON SURVEY WITHOUT SUFFICIENT PASSAGE OF TIME (AND THE EFFECT OF THE FIRST
SURVEY’S RESULTS ON THE LATTER)
• EXCLUSION / INCLUSION OF OUTLIER DATA POINTS
• INSUFFICIENT ANALYSIS AND REFINEMENT
20
21. THE TIME FACTOR
• CROSS-SECTIONAL OR SLICE-IN-TIME SURVEYING
• MULTIPLE-SEQUENTIAL SURVEYING
• LONGITUDINAL (OR PERIODIC OVER-TIME SURVEYING)
21
22. SAMPLING OF SURVEY RESPONDENTS
• RANDOM (AND SUFFICIENT) SAMPLING THE “GOLD STANDARD” FOR GENERALIZING TO
A POPULATION
• STRATIFIED RANDOM SAMPLING TO SELECT MEMBERS OF PARTICULAR GROUPS AS
RESPONDENTS
• SIMPLE RANDOM CLUSTER SAMPLING (CONVENIENCE SAMPLING, ASSUMPTION OF PRE-
DEFINED CLUSTERS IN THE POPULATION)
• CONVENIENCE SAMPLING (LIKE SNOWBALL SAMPLING, WHICH BIASES TOWARDS
HIGHLY CONNECTED ACTORS/AGENTS); NON-RANDOM; GOLD STANDARD AS
“REPRESENTATIVE” SAMPLING FOR QUALITATIVE RESEARCH
22
23. SAMPLING OF SURVEY RESPONDENTS (CONT.)
• SYSTEMATIC (STRATIFIED RANDOM) SAMPLING (LIKE EVERY 5TH PERSON…, MAY HAVE HIDDEN
IF UNINTENTIONAL BIASES, WITH A COMMON EXAMPLE AS A-Z SAMPLING BUT WITH FEWER
INDIVIDUALS WITH NAMES IN THE W-Z RANGE)
• OPEN-CALL SAMPLING WITH AN ONLINE SURVEY
• BIAS IN TERMS OF THOSE WHO SELF-SELECT IN OR OPT-IN, HAVE INTEREST, HAVE TECHNO ACCESS
AND SAVVY, ARE MORE ACTIVIST (MAY NOT REPRESENT QUIETER VOICES)
• POTENTIAL DIFFICULTY IN VERIFYING IDENTITY (EXCEPT INTERNET PROTOCOL OR “IP” ADDRESSES)
• POTENTIAL BROADER GEOGRAPHIC REACH THAN OTHERWISE
23
24. SAMPLING OF SURVEY RESPONDENTS (CONT.)
• CASE CONTROL: CASE GROUP (“EXTANT” CONDITION) AND CONTROL GROUP (ABSENCE OF
“EXTANT” CONDITION) FOR COMPARISON AND CONTRAST AND POTENTIAL GENERALIZING
24
25. ONLINE SURVEYS
• RESEARCHER NEEDS TO KNOW AND DEPLOY THE TECHNOLOGY WELL
• MUST PROTECT THE DATA WELL TO MEET ALL LEGAL GUIDELINES (GOING WITH A TRUSTED
SURVEY COMPANY)
• MUST PROTECT PARTICIPANT PRIVACY AND CONFIDENTIALITY
• MUST DE-IDENTIFY DATA / ANONYMIZE BEFORE DATA INGESTION INTO AN ANALYSIS TOOL
(OR DATASET SHARING THROUGH REPOSITORIES OR “REPRODUCIBLE RESEARCH” ARTICLES)
• MUST OFFER OPT-OUT FUNCTION AT ANY TIME (FOR IRB STANDARDS)
• MUST ANTICIPATE POTENTIAL HARM AND MITIGATE
25
26. ONLINE SURVEYS (CONT.)
• ONLINE SURVEY MUST BE FULLY COMPREHENSIBLE WITHOUT SURVEY TAKER INTERVENTION
(DESIGNED TO HEAD OFF POTENTIAL MISINTERPRETATION WITH ADDITIONAL OPT-IN DATA AS
NEEDED)
• MAY HAVE “ENUMERATORS” OR ASSISTANTS STAND IN THE LITERACY / NUMERACY GAP
• DATA USUALLY A MIX OF QUANTITATIVE AND QUALITATIVE DATA
• MAY BE EXPORTED AS .CSV, .DOCX, .PDF, AND OTHER FILE TYPES (MANY OF WHICH ARE
TRANSCODABLE WITH LITTLE EFFORT)
• MAY BE PARTIALLY EXPORTED IN PRE-MADE TABLES, CHARTS, AND GRAPHS
26
27. DATA FORMS
• .XLSX DATA TABLES (FOR QUANT DATA)
• .CSV TEXT FILES, .DOC AND .DOCX TEXT FILES
• SOME PRE-EXTRACTED BAR CHARTS FROM THE ONLINE SURVEY SYSTEMS
• AUDIO
• STILL IMAGERY
• VIDEO
27
29. GENERAL PRINCIPLES AND PRACTICES OF DATA TYPES
QUANTITATIVE
REPRODUCIBLE
LAB-BASED, PRE- AND POST-
PLACEBO APPROACH
EMPIRICALLY OBSERVABLE / MEASUREABLE
GENERALIZABLE
MANAGING SUBJECTIVITY THROUGH
METHODS, OVERSIGHT, TOOLS, AND PEER
REVIEW
“IN THE GLASS,” IN THE LAB (IN VITRO)
QUALITATIVE
“ISOLATE” AND CASE BASED
TRIANGULATION
SATURATION
NON-GENERALIZABLE
MANAGING HUMAN SUBJECTIVITY
THROUGH RESEARCHER AWARENESS AND
DISCLOSURE, RESTRICTION OF CLAIMS
“IN THE BODY,” IN THE WORLD (IN VIVO)
MIXED METHODS
SYNTHESIZED RESEARCH USING QUANT
AND QUAL METHODS, DATA, THEORIES,
AND PARADIGMS
SEQUENCE OF RESEARCH COMBINING
QUANTITATIVE AND QUALITATIVE
RESEARCH METHODS (EACH TREATED
MORE DISCRETELY)
29
MULTI METHODS
30. SOME AFFORDANCES OF ONLINE SURVEYS
• ADAPTABLE TO VARIOUS TYPES OF RESEARCH APPROACHES AND METHODS
• MAY BE STAND-ALONE OR COMPLEMENTARITY TO OTHER RESEARCH SOURCES / DATA STREAMS
• ENABLES…
• A BROAD GEOGRAPHICAL BREADTH OF RESEARCH AND ACCESS
• BROAD INTEGRATION OF MULTIMEDIA (AUDIO, VIDEO, INTERACTIVE MAPS, SIMULATIONS, AND OTHERS)
• MULTI-LINGUAL APPROACHES
• CONDITIONAL BRANCHING
• CAPTURING A BROAD RANGE OF DATA
• SOME SURVEILLANCE AGAINST HACKING AND MIS-USE
• BUILT-IN DATA PROTECTIONS
30
31. SOME AFFORDANCES OF ONLINE SURVEYS (CONT.)
• RICH QUESTION TYPES IN QUALTRICS:
• TEXT/GRAPHIC QUESTIONS
• MULTIPLE-CHOICE QUESTIONS
• MATRIX TABLE QUESTIONS
• TEXT ENTRY QUESTIONS
• SLIDER QUESTIONS
• RANK ORDER QUESTIONS
• SIDE-BY-SIDE QUESTIONS
• CONSTANT SUM QUESTIONS
• PICK, GROUP, AND RANK QUESTIONS
• HOT SPOT QUESTIONS
• HEAT MAP QUESTIONS
• GRAPHIC SLIDER QUESTIONS
• GAP ANALYSIS QUESTIONS
• DRILL DOWN QUESTIONS
• INVISIBLE QUESTIONS: TIMING, META
INFORMATION, FILE UPLOAD, AND CAPTCHA
VERIFICATION
31
32. SOME AFFORDANCES OF ONLINE SURVEYS (CONT.)
• OTHER ADVANCED AFFORDANCES:
• BRANCHING LOGIC
• PIPED TEXT (CUSTOMIZED TEXT, SEQUENCES,
AND INTERACTIONS)
• E-MAIL AND PANEL TRIGGERS
• QUOTAS
• GOOGLE TRANSLATE INTEGRATION
• VIEWING PANELS, AND OTHERS
32
34. DATA HANDLING
• DOWNLOADING RAW DATA AND KEEPING A PRISTINE SET UNTOUCHED BY ANALYSTS OR ANYONE
• LEAST LOSSY DATA MAINTENANCE (OFTEN IN NATIVE STRUCTURE, SUCH AS THE MOST HIGH RESOLUTION
FOR IMAGERY AND THE MOST DETAILED AUDIO-VISUAL FILE TYPES)
• CLEAR DATA PROVENANCE: RECORD-KEEPING ABOUT WHERE DATA COME FROM AND THE
PARAMETERS FOR THE DATA COLLECTION / EXTRACTION
• PRESERVATION OF DATA AT ALL STAGES, FROM RAW TO PROCESSED (NOT RETROACTIVELY
CHANGING UP DATA); FUTURE-PROOFING AGAINST POTENTIAL DATA INACCESSIBILITY IN THE
FUTURE
• CONSISTENT FILE NAMING PROTOCOLS
• README FILES
34
35. DATA HANDLING (CONT.)
• PROTECTING AGAINST DATA CORRUPTION AND DATA TAMPERING
• PROTECTING AGAINST DATA LEAKAGE (AND POTENTIAL MIS-USE)
• LEAST-PRIVILEGE PROTECTIONISM
• METADATA SCRUBBING / STRIPPING OFF FOR THE WORKING FILES
• ANONYNIMIZATION (IDENTITY INVISIBLE EVEN TO THE RESEARCHER)
• PSEUDONYMIZATION (IDENTITY REASSIGNED FOR PUBLICATION, BUT IDENTITY VISIBLE TO THE
RESEARCHER)
• NO PREMATURE RELEASE OF DATA; NO UNAUTHORIZED RELEASE OF DATA
35
36. DATA HANDLING (CONT.)
• PROTECTION OF ALL PARTICIPANTS IN THE RESEARCH (INCLUDING THEIR DATA AND PRIVACY)
• FOLLOWING FORMAL GUIDANCE IN TERMS OF DATA HANDLING (INSTITUTIONAL REVIEW
BOARD RULES)
• RESEARCHER INTIMACY WITH THE DATA
36
37. DATA ANALYSIS
DATA CODING:
• THE USES OF THEORIES, MODELS, AND / OR FRAMEWORKS TO UNDERSTAND DATA AND
INFORMATION: CODING BASED ON EXPECTATIONS FROM THEORIES, MODELS, AND / OR
FRAMEWORKS (~ TO AN A PRIORI APPROACH)
• EMERGENT INTERPRETATION: CODING BASED ON EXTRACTED INFORMATION AND SEEING WHERE
IT GOES (~ TO A GROUNDED THEORY APPROACH)
• MIXED CODING APPROACHES: COMBINING BOTH A PRIOR AND EMERGENT CODING METHODS
• IF TEAM CODING, FOCUSING ON VARIOUS ASPECTS OF CONSENSUS (INTERRATER RELIABILITY)
OR DISSENSUS…AND OTHER FACTORS
37
38. DATA ANALYSIS (CONT.)
• MAINTENANCE OF RESEARCH JOURNALING / MEMO-ING / ANNOTATIONS, AND COLLECTION
OF THINKING ABOUT DATA ANALYSIS
• NOT OVER-ASSERTING AND NOT GENERALIZING FROM THE INFORMATION
• APPLYING DIFFERENT METHODS TO ANALYZING DATA (WITH CLEAR DOCUMENTATION OF
EACH APPROACH, EACH TOOL, AND THE OVERALL DATA PROCESSING AND ANALYTICAL
SEQUENCE); A PREFERENCE FOR SIMPLE METHODS WITH GOOD DATA
• JOURNALING / RECORD-KEEPING OF INSIGHTS AT EACH STAGE
38
39. DATA VISUALIZATIONS
• USING DIFFERENT DATA VISUALIZATIONS FOR DIFFERENT KNOWLEDGE EXTRACTIONS
• LABELING DATA AND OFFERING LEGENDS
• INFORMATIVE LEAD-UP AND LEAD-AWAY TEXT TO DATA VISUALIZATIONS
• SUPPORT FOR INFORMATION CLARITY; AVOIDANCE OF NEGATIVE LEARNING FROM THE DATA
VISUALIZATIONS (CONTROLLING FOR MIS-IMPRESSIONS)
39
40. DATA VISUALIZATIONS (CONT.)
• PROVISION OF ACCESS TO UNDERLYING (EVEN RAW) DATA (SUCH AS IN TABLES) AND
DATASETS
• NO MANIPULATIONS OF THE DATA CONSUMERS (NO POLITICAL OVERRIDE, NO IMPRESSION
MANAGEMENT, NO MANIPULATION OF AFFECT / EMOTION OR COGNITION)
• REVELATION OF SUBJECTIVITIES
40
42. ABOUT THE ONLINE SURVEY SYSTEM
• A CLOUD-BASED TOOL: INTERFACE “TOUCHINESS,” DATA STORAGE STATESIDE,
SOPHISTICATED TOOL DESIGN, MULTIMEDIA INTEGRATION CAPABILITY
• LIBRARIES: AVAILABLE SURVEY TEMPLATES AND TEMPLATE LIBRARIES
• HELP: PROFESSIONAL, FRIENDLY, AND READY HELP SUPPORTS
• SURVEY REPORTING FEATURE: SURVEY STATISTICS, SURVEY DURATIONS, COMPLETION
PERCENT, QUESTION RESPONSE RATES, “DROP OUTS” OR LAST ANSWERED QUESTION
COUNTS
• ADDITIONAL CAPTURING OF “TRACE” DATA
42
43. ABOUT THE ONLINE SURVEY SYSTEM (CONT.)
• PILOT TESTING TO CATCH SOME ISSUES WITH THE SURVEY INSTRUMENT:
• AVOIDANCE OF BIAS
• QUESTION PHRASING AND RESPONDENT COMPREHENSION
• LOW QUESTION RESPONSE RATES
• SURVEY RESPONDENT DROPOUT RATES
• TECHNOLOGY BEHAVIOR (PARTICULARLY IN THE ON-GROUND CONTEXT)
• DESIGN FLAWS
• … AND OTHERS
43
44. RESPONDENT DEMOGRAPHIC DATA
• INFORMATION ABOUT AGE, GENDER, BACKGROUND, LANGUAGE, REGION, EARNINGS (SES),
AND OTHER INFORMATION
• MAY BE USED TO CLUSTER RESPONDENTS ACCORDING TO GROUPNESS IN ORDER TO EXPLORE
POTENTIAL PATTERNING AND TO POSE PARTICULAR QUESTIONS
• MAY CREATE GROUPNESS ON NON-DEMOGRAPHIC FACTORS LIKE ATTITUDES TOWARDS
PARTICULAR ISSUES, CERTAIN STATES-OF-BEING LIKE HEALTH STATUS, AND OTHERS
44
45. WITHIN-QUALTRICS DATA ANALYTICS
• CROSS-TABULATION ANALYSIS (AKA
“CONTINGENCY TABLE”)
• CAPTURES MULTIVARIATE FREQUENCY DISTRIBUTIONS
(INCLUDING WITH CATEGORICAL DATA)
• DEFINED “BANNERS” (COLUMNS) AND “STUBS”
(ROWS)
• ENABLES USAGE OF DATA FROM MULTIPLE-CHOICE
QUESTIONS, MATRIX QUESTIONS, AND EMBEDDED
DATA; ACCESS TO ORIGINAL QUESTIONS AND DATA
• INCLUDES CALCULATION OF CHI-SQUARE MEASURE,
DEGREES OF FREEDOM (DF), AND P-VALUE OR ALPHA
(FOR SUFFICIENT STATISTICAL SIGNIFICANCE < .05
OR < .01 TO REJECT THE NULL HYPOTHESIS)
45
48. DATA EXPORT FOR PROCESSING AND
ANALYSIS IN OTHER TOOLS
OTHER OFF-PLATFORM ANALYSES
• DESCRIPTIVE ANALYSIS ABOUT RESPECTIVE SUB-
POPULATIONS BASED ON CROPS, ON LOCATIONS, ON
GENDER
• INFERENTIAL STATISTICS ABOUT RESPECTIVE SUB-
POPULATIONS
• CONTENT ANALYSES OF RESPONSES (BASED ON TEXT)
• TESTS OF HYPOTHESES BASED ON THE AVAILABLE
EXTRACTED DATA (PARTICULARLY PREDICTOR AND
OUTCOME VARIABLES, FROM OBSERVED DATA)
SOME REQUIREMENTS
• EXPORT FOR ANALYSIS IN OTHER TOOLS (USUALLY
AS TEXT OR AS DATASETS)
• DATA RE-STRUCTURING (AND CLEANING) FOR
VARIOUS TYPES OF QUERYING
• SETUP OF DUMMY VARIABLES
• SETUP OF DIFFERENT DATASETS
• DATA NORMALIZING FOR COMPARABILITY ACROSS
SETS
• FULLY-AUTOMATED MACHINE-BASED DATA
EXTRACTIONS
48
50. EXCEL: DATA PROCESSING AND VISUALIZATION
• ERASE ALL BLANK CELLS: HIGHLIGHT THE CELLS IN A. IN THE HOME TAB IN EXCEL, GO TO THE
EDITING AREA. CLICK ON SORT & FILTER. SORT A TO Z. CLICK F5. CLICK THE SPECIAL
BUTTON. CHOOSE THE RADIO BUTTON NEXT TO BLANKS. CLICK “OK.” THEN, CLICK CTRL + -, AND
SELECT “SHIFT CELLS UP.” THIS DELETES ALL EMPTY CELLS IN EXCEL.
• ALPHABETIZE CELLS: TO PROCESS THE MIXED TEXTUAL, NUMBER, AND DATE DATA, FIRST, DELETE
THE QUESTION AND QUESTION LABEL (OFTEN IN CELLS A1 AND A2). (IF THIS INFORMATION IS STILL
NEEDED, PASTE THE DATA IN TO SOME OTHER CELLS LIKE J1 AND J2).
• FILTERING DATA: SELECTING OUT PARTICULAR INFORMATION FOR ATTENTION OR PROCESSING
• DATA VISUALIZATIONS: FREQUENCY BAR CHARTS
50
51. EXCEL: DATA PROCESSING AND VISUALIZATION (CONT.)
• MORE BASIC PROCESSING IN EXCEL
• DATA EXTRACTIONS FROM THE WEB: “POWER QUERY” ADD-IN TO EXCEL 2013 (AND
BACKWARDS COMPATIBLE TO 2010)
• INTEGRATIONS OF DATA FROM DATABASES, AZURE CLOUD, HADOOP (HDFS) DATA THROUGH
POWER QUERY TAB
• DATA MAPPING AND VISUALIZATIONS: “POWER VIEW” ADD-IN TO EXCEL 2013 (AND
BACKWARDS COMPATIBLE TO 2010)
51
52. MICROSOFT WORD: SIMPLE TEXT PROCESSING
• TEXT COUNTS FROM THE LARGEST PHRASES FIRST…AND THEN THE SMALLER WORDS (USING
SEARCH + REPLACE)
• PROCESSING NUMERICAL RESPONSES TO IMAGE SELECTION AS A SURVEY QUESTION
RESPONSE (WITH NUMBERED LABELS FOR THE IMAGES)
52
53. NOTEPAD OR OTHER TEXT EDITOR: CLEANING TEXT
• CLEANING TEXT BETWEEN SOFTWARE PROGRAMS
• ENABLING READABILITY OF HTML OR XML FILES
53
55. NVIVO 10: WORD FREQUENCY COUNTS, TEXT
SEARCHES, WORD PROXIMITY EXPLORATION, TEXT
MAPPING
• DATA CODING (BY BOTH HUMAN AND
MACHINE)
• DATA PROCESSING
• DATA COLLECTION (FROM SOME SOCIAL
MEDIA PLATFORMS, OR OTHER
COMPLEMENTARY STREAMS OF DATA)
• WORD FREQUENCY COUNTS
• WORD TREES / TEXT FINDS
• WORD PROXIMITY EXPLORATION AND
ANALYSES
• TEXT MAPPING AND VISUALIZATIONS
• GEOGRAPHICAL MAPPING (WITH RELATED
GEOGRAPHICAL INFORMATION)
55
56. AUTOMAP AND ORA NETSCENES:
EXTRACTION OF CONTENT NETWORKS FROM TEXTS
AND TEXT CORPUSES
AUTOMAP
• TEXT-BASED CONTENT NETWORKS
• APPLICATION OF CUSTOMIZED
THESAURUSES
• LEXICAL NETWORKS, AND OTHERS
ORA NETSCENES
• VISUALIZING CONTENT NETWORKS AS
GRAPHS
56
57. NODEXL, UCINET: NETWORK GRAPHING
• TEXT NETWORK GRAPHING
• NETWORK VISUALIZATION AND ANALYSIS, AND OTHERS
57
58. TABLEAU (PUBLIC), ARCMAP/ARCGIS PRO:
GEOGRAPHICAL MAPPING
• MAPPING AND VISUALIZING INFORMATION (TABLEAU)
• USEFUL FOR FAST PROOF-OF-CONCEPTS
• MAPPING INFORMATION TO PHYSICAL LOCATIONS (TABLEAU AND ARCMAP/ ARCGIS PRO)
58
59. RAPIDMINER STUDIO: DATAMINING
• DATA DESCRIPTIONS: PATTERN
IDENTIFICATION VIA SCATTER MATRIX
VISUALIZATIONS (BASED ON VARIABLES);
CLASSIFICATION MODELS (NAÏVE BAYES,
AND OTHERS)
• VARIABLE ASSOCIATIONS: LINEAR
REGRESSIONS; LOGISTIC REGRESSIONS
(WITH BINARY TARGET VARIABLES)
• MACHINE LEARNING ALGORITHMS FOR
LATENT DATA: ARTIFICIAL NEURAL
NETWORKS (EXTRAPOLATED INDEPENDENT
VARIABLES TO DEPENDENT VARIABLE);
GENETIC ALGORITHMS; DECISION TREES,
AND OTHERS
59
60. RAPIDMINER STUDIO: DATAMINING (CONT.)
• TEXTUAL ANALYSIS: INDUCTIVE
CLUSTERING (VIA K-MEANS CLUSTERING)
FOR TEXTUAL AND QUANTITATIVE VARIABLE
DATA (AND OUTPUT AS CORRELATION
MATRICES AND CENTROID PLOT VIEWS);
DOCUMENT VECTOR MODELS
• SENTIMENT ANALYSIS IN TEXT CORPUSES
ADDITIONAL GENERAL FEATURES
• EASY GRAPHICAL USER INTERFACE (GUI)
• END-TO-END SEQUENCE FROM DATA
PROCESSING TO MODELING TO FINAL
ANALYTICAL OUTPUTS
• MODEL CROSS-VALIDATION AND
PERFORMANCE METRICS
• VARIOUS INTERMEDIATE AND FINALIZED
DATA VISUALIZATIONS
60
62. RESEARCHER SUBJECTIVITY
• EVEN MACHINE-COLLECTED AND (PARTIALLY) MACHINE-PROCESSED INFORMATION HAS TO
ULTIMATELY BE ANALYZED BY PEOPLE
• THE IMPORTANCE OF DOMAIN EXPERTISE; THE IMPORTANCE OF TRAINED NAIVETÉ
• RESEARCHERS ENGAGE IN…
• SOME MANUAL “CLOSE READING” INTERPRETATION OF TEXTUAL DATA
• DECIDING ASSERTABILITY (BASED ON THE EVIDENCE)
62
63. MITIGATIONS FOR HUMAN RESEARCHER
SHORTCOMINGS
(EXPECTATIONS, EGOS, COGNITIVE BIASES)
• STRENGTH OF THE SURVEY DESIGN, RANDOM SAMPLING, MULTIPLE DATA VISUALIZATIONS
• RESEARCHER TRAINING: HEALTHY SKEPTICISM, NOT FINALIZING THE DATA UNTIL EVERYTHING
HAS BEEN ANALYZED
• PEER REVIEW PROCESSES FOR PRESENTATIONS AND PUBLICATION
• BROAD (OR LIMITED) PUBLICATION OF DATASETS FOR EXTERNAL CHALLENGE AND
VERIFICATION
• RESEARCH OVERSIGHT
63
64. THE PROBLEM OF HUMAN MANIPULATION OF DATA
• FREEZING DATA AND DATA RESULTS
(AGAINST CHANGE)
• THE “UNTHINKING DRIVE-BY-GPS
APPROACH”: THE “3 DAYS INTERPRETATION”
(AND THE INSIDIOUSNESS OF PRE-EXISTING
MENTAL MODELS)
• THE DELETION OF DATA RECORDS (WITH
SURVEY ADMIN ACCESS)
• INCORRECT USE OF TECHNOLOGY SYSTEMS
• INCORRECT APPLICATION OF PROCESSES
• MISREADING DATA (GIVEN ITS COMPLEXITY)
• THE TWEAKING OF DATA IN MISLEADING
WAYS
• THE NOT WANTING TO SEE
64
66. CASE 1: A MULTI-COUNTRY MULTI-GRAIN
MULTI-YEAR SURVEY
MAIZE, SESAME, WHEAT, AND CHICKPEA DATA
2015
66
67. AN OVERVIEW
• THE PRINCIPAL INVESTIGATORS (PI’S) FROM MULTIPLE COUNTRIES
• TOPICS INVOLVING CROPS, CROPPING METHODS AND TECHNOLOGIES, ECONOMIC AND
BUDGETARY ISSUES, FARMER LIFESTYLE ISSUES, SOURCES OF DATA AND TRAINING, AND OTHER
DETAILS
• ENUMERATORS AND TRANSLATORS
• A TARGET POPULATION OF THOSE WHO FARM IN A COUNTRY IN EAST AFRICA
67
68. A NON-EXPERT AND OUTSIDER APPROACH TO
FIRST-RUN DATA ANALYSIS
ENHANCING THE SURVEY AND SURVEY PROCESSES
• USING COORDINATED UNIVERSAL TIME (UTC) TO DATE-STAMP RESPONSES AND RECODED TO
EASTERN AFRICA TIME (EAT)
• CLARIFYING AGROECOLOGICAL POSITIONS OF RESPONDENTS
• ANALYZING COMPLETION RATES; POINTS AT SURVEY DROPOUT
• ANALYZING QUESTIONS WITH LOW RESPONSE RATES
• LANGUAGE CLARITY ISSUES WITH QUESTIONS
• ENSURING THAT THE QUESTION ASKED AND THE STRUCTURED QUESTION ON QUALTRICS ALIGNED
(NOT AN OPEN-ENDED QUESTION WITH A CLOSE-ENDED AND NON-COMPREHENSIVE DROP-
DOWN SELECTION; TEXT-BASED QUERIES WITHOUT THE TEXT ENABLEMENT; 68
69. A NON-EXPERT AND OUTSIDER APPROACH TO
FIRST-RUN DATA ANALYSIS (CONT.)
ENHANCING THE SURVEY AND SURVEY PROCESSES (CONT.)
• ENHANCING CLARITY ABOUT SEQUENCES OF ACTIVITIES FOR WHICH THERE ARE OVERLAPS
(NON-MUTUAL EXCLUSIVITY)
• REPORTING NOT INTERPRETING: NOT MAKING ASSUMPTIONS ABOUT TEXTUAL DATA
(REGIONS, NAME BRANDS, FOODS, DIALECTS) THAT MAY BE SYNONYMOUS (CLOSE SPELLINGS,
CLOSE PRONUNCIATIONS) BUT ENABLING THE EXPERTS TO DO THE SORTING
• AVOIDING HIDING OR MASKING DATA, ERRING ON THE SIDE OF LEAVING DATA RAW
• SUGGESTING DROPDOWNS FOR SOME OF THESE…WITH AN ADDED TEXT FIELD FOLLOW-ON TO
CAPTURE ANY OTHER POSSIBLE MISSED DATA
69
70. A NON-EXPERT AND OUTSIDER APPROACH TO
FIRST-RUN DATA ANALYSIS (CONT.)
ENHANCING THE SURVEY AND SURVEY PROCESSES (CONT.)
• NUMERACY, LITERACY, GEOGRAPHICAL, BUDGETARY, AND OTHER CHALLENGES: MIXED
NUMERICAL PARAMETERS (SUCH AS ML PER HECTARE, ML PER QUINTAL AND THEN OTHER
VARIATIONS)
• USING THE LOWEST COMMON DENOMINATOR TO REPRESENT MEASURES (AND OFFERING
CONVERSIONS AND DEFINED TERMS AND UNDERSTANDINGS)
• CLARIFYING WHEN SOMETHING IS A RATE (SUCH AS A PARTICULAR FOOD CONSUMPTION
AMOUNT OVER TIME VS. A ONE-OFF)
• NOT ENABLING MUTUAL EXCLUSIVITIES
70
71. A NON-EXPERT AND OUTSIDER APPROACH TO
FIRST-RUN DATA ANALYSIS (CONT.)
ENHANCING THE SURVEY AND SURVEY PROCESSES (CONT.)
• PARTICIPANT PRIVACY: THE HIDING OF RESPONDENT AND ENUMERATOR NAMES (AND THEIR
CONTACT TELEPHONE NUMBERS) IN THE REPORT (BY POINTING ALL PI’S TO THE ARCHIVED
SURVEY FOR SENSITIVE DATA)
• PILOT-TESTING SURVEYS: ENCOURAGING AWARENESS OF THE NEED TO PILOT TEST AND TO
IMPROVE THE SURVEY WITH EACH ITERATION (WITHOUT CHANGING SUBSTANCE, IF
TRENDLINE ANALYSIS IS NEEDED); MAINTAINING A PRISTINE MASTER, COLLECTING DATA
ABOUT CHALLENGES WITH A SURVEY AND ITS USE IN THE WORLD
71
72. A NON-EXPERT AND OUTSIDER APPROACH TO
FIRST-RUN DATA ANALYSIS (CONT.)
THE APPROACH
• COLLECTION OF ALL RAW DATA (STRUCK THROUGH AND LABELED WITH “FOR POSSIBLE
RESEARCH REFERENCE ONLY” AND HIGHLIGHTED WITH SIGNALING COLORS) AND MAKING
THESE AVAILABLE FOR DOUBLE-CHECKING AND CONTENT ANALYSIS (INCLUDING EVALUATION
IN OTHER DATA ANALYSIS TOOLS)
• CONNECTING RAW DATA WITH PROCESSED DATA (USING Q1, Q2, NAMING PROTOCOLS
THROUGH ALL FILES AND DOCUMENTS FOR EASY LINKING AND SEARCHABILITY)
• AVAILABILITY OF ALL SUPPORT FILES (SUCH AS THOSE FROM WHICH TABLES WERE MADE) TO THE PI-
S FOR CLARITY
• SCREENSHOTS FROM WITHIN QUALTRICS FOR QUICK SUMMARY DATA (OF SURVEY RESULTS
FEATURES) BUT ACTUAL ANALYZED DATA FROM DOWNLOADS FOR ALL OTHER TABLES AND CHARTS
(NON-TRIVIAL LIMITATIONS TO PROCESSED DATA USED ALONE, WITHOUT CONSIDERING THE
UNDERLYING RAW DATA)
72
73. A NON-EXPERT AND OUTSIDER APPROACH TO
FIRST-RUN DATA ANALYSIS (CONT.)
• FULL DOCUMENTATION ABOUT ALL DATA HANDLING (WITH STEP-BY-STEPS FOR TRANSFER TO PIS);
CLEAR DATA PROVENANCE
• ALIGNING THE DATA ANALYSIS WITH THE SEQUENCE OF THE ORIGINAL SURVEYS (FOR COHERENCE)
• ANALYSIS OF ALL ELECTRONIC DATA, NOT THE ONE SET OF MANUALLY COLLECTED DATA FOR CHICKPEA
FARMING (HANDLED BY THE LOCAL PI)
• NON-RELEASE OF PARTIAL SURVEY DATA UNTIL FULL SETS EXTRACTED AND ANALYZED
• SUMMARIZING NUMBERS AND VISUAL DATA IN WORDS; SUMMARIZING TEXTUAL DATA
QUANTITATIVELY AS WELL (FREQUENCY COUNTS) AND IN TABLE AND CHART FORMAT
• ENABLING CLASSIC ELEMENTS (MIN-MAX DATA RANGES, AVERAGES/MEANS, MEDIANS, MODES,
FREQUENCY COUNTS, CLASSIC DATA DISTRIBUTION VISUALIZATIONS LIKE LINE CHARTS, BAR
CHARTS, AND OTHERS); ENABLING PHYSICAL MAPPING AND LESS COMMON DATA
REPRESENTATIONS 73
74. A NON-EXPERT AND OUTSIDER APPROACH TO
FIRST-RUN DATA ANALYSIS (CONT.)
DATA AGGREGATION AND DISAGGREGATION
• BREAKING APART MULTI-PART QUESTIONS FOR ANALYSIS IN SEPARATE PARTS
• REFRAMING COMPLEX QUESTIONS INTO MULTIPLE PARTS FOR FULL DATA EXPLOITATION
• EXTRACTING LARGE MIXED DATASETS FROM SINGLE QUESTIONS
• BREAKING APART MENTIONED FOODS (TO ATOMISTIC ELEMENTS)
• MIXING PERSONAL PROTECTIVE CLOTHING AS FULL SETS (FROM HEAD TO TOE) *AND* AS
INDIVIDUAL ELEMENTS (FROM HEAD TO TOE) (BECAUSE THE DATA MEAN DIFFERENT THINGS)
• OFFERING SIDE-BY-SIDES OF RELATED PROCESSED AND RAW DATA TABLES
74
75. A NON-EXPERT AND OUTSIDER APPROACH TO
FIRST-RUN DATA ANALYSIS (CONT.)
• HANDLING TEXTUAL DATA: FAST COUNTING FROM OPEN-ENDED TEXT RESPONSES (IN EXCEL)
FOR FREQUENCY BAR CHARTS; BULLETED SUMMARIES IN SOME ELEMENTS
• AS LITTLE DATA “LOSSINESS” AS POSSIBLE
• FOCUS ON FACTS, NOT INTERPRETATION
• CLEAN LINES OF LOGIC, AVOIDANCE OF CONFLATION (OF VARIABLES, OF CONCEPTS, OF TERMS)
• LOOKING FOR VERBATIM QUOTABLES (SIC), ANOMALIES, AND UNUSUAL CASES; LOOKING FOR
AMBIGUOUS ELEMENTS AND THE NEED TO DISAMBIGUATE (WITHOUT DOING SO INTRUSIVELY)
• EXPLORING FOR CATEGORICAL DATA ANALYSIS POSSIBILITIES (INCLUDING MANY MACHINE-
ENHANCED ONES, SUCH AS DATAMINING AND TEXT MINING TOOLS)
75
76. A NON-EXPERT AND OUTSIDER APPROACH TO
FIRST-RUN DATA ANALYSIS (CONT.)
• REDUNDANT DATA STORAGE: MULTI-SITE DATA STORAGE
• ADDITIONAL DATA EXPLOITATION: PROVIDING ADDITIONAL SUGGESTIONS FOR DATA
EXPLOITATION (MORE ON THIS IN THE CONCLUDING SUMMARY)
• BEING CLEAR ABOUT THE ADDITIONAL DATA CLEANING, DATA STRUCTURING, AND OTHER WORK
REQUIRED
• BEING CLEAR ABOUT THE QUALIFIERS THAT HAVE TO BE APPLIED TO THESE ASSERTIONS
76
77. SOME LESSONS LEARNED (+)
• GO AMBITIOUS. GO FOR BREADTH AND DEPTH. GO FOR LEARNING ON MULTIPLE RELATED
DIMENSIONS.
• BE GENEROUS WITH DATA WHERE POSSIBLE. USE SURVEYS TO BENEFIT THE PROFESSIONAL WORK OF ALL
THE PRINCIPAL INVESTIGATORS (PI-S) ON THE PROJECT. LEARN FROM EACH OTHER.
• USE DATA TO IMPROVE POLICIES, MARKETS, PRACTICES, AND PEOPLE’S AWARENESS.
• WELL DESIGNED SURVEYS MAY BE APPLIED ACROSS A RANGE OF CONTEXTS AND LOCALITIES. THEY ARE
NOT JUST MADE FOR SINGLE CONTEXTS. THEY MAY BE USED COLLABORATIVELY IN INTERNATIONAL
CONTEXTS.
• BE SENSITIVE TO ON-GROUND REALITIES. DEVELOP AN EAR FOR HEARING WHAT IS BEING SAID.
77
78. SOME LESSONS LEARNED (+) (CONT.)
• THERE ARE BENEFITS TO BEING AN ANONYMOUS (BUT FULLY RESPONSIBLE) DATA ANALYST:
• NO BENEFIT TO ME TO HAVE BYLINE CREDIT IN HIGHLY DISPARATE FIELDS (HAVING THE APPLIED
SKILLSET IS MORE IMPORTANT)
• LIMITS RISK IF SOME DATA ARE NOT PROPERLY PROVENANCED OR REPRESENTED IN THE FINAL PRODUCTS
(PUBLICATIONS, PRESENTATIONS, DATASETS)
• MINE IS A SUPPORT POSITION WITH WORK PAID OUT BY THE FUNDER (USAID, IN THIS CASE)
• MAY HINDER THE WORK WITH THE PRINCIPAL INVESTIGATORS (PI’S) AND MY OWN SUPERVISOR
(CLEARS THE AIR IF BYLINE CREDIT IS OFF-THE-TABLE)
• SOME PI’S SEE BYLINE CREDIT AS A FORM OF PAYMENT (IT’S NOT, AND IT’S ALSO NOT GENERALLY
MONETIZABLE…AND IS NOT GENERALLY FUNGIBLE)
• BYLINE CREDIT SOMETIMES BRINGS EGOTISM INTO PLAY (NO THANKS!)
78
79. SOME LESSONS LEARNED (+) (CONT.)
• AVOID UNNECESSARY FETISHIZING OF SECRECY AND OVER-PROTECTION OF INFORMATION
WHERE THAT IS NOT NECESSARY; USE PROPER JUDGMENT
79
80. SOME LESSONS LEARNED (-) (CONT.)
• ENUMERATOR TRAINING: TRAINING OF ENUMERATORS (NEUTRALITY, NORMING, NOT USING
“COPY AND PASTE” RESPONSES, PHONETIC SPELLINGS INSTEAD OF ACTUAL SPELLINGS),
PARTICULARLY FOR ENUMERATORS AT A DISTANCE
• SURVEY DESIGN:
• FOLLOW-ON QUESTIONS TREATED AS CENTRAL ONES, RESULTING IN SOME QUESTIONS WITH LOW
RESPONSE RATES
• FAILURE TO USE SURVEY DESIGN TO MITIGATE FOR SOME CHALLENGES IN LITERACY / NUMERACY /
MULTI-LINGUAL APPROACHES / SPELLING
• FAILURE TO USE SURVEY DESIGN TO DEAL WITH PLACE AND ORGANIZATION NAMES (AND
DISAMBIGUATION)
80
81. SOME LESSONS LEARNED (-) (CONT.)
• ATTENTION TO DETAILS: DOUBLE-CHECK ALL WORK.
• SLEEP ON IT, AND THEN REVISIT ASSUMPTIONS AND UNDERSTANDINGS AND DETAILS.
• ENGAGING ANOMALIES: RUN ANOMALOUS FINDINGS TO GROUND. CORRECT FOR
MISUNDERSTANDING OF DATA.
• PROPER USE OF TOOLS: MAGNIFY ACCURACY AND EFFICIENCY USING VARIOUS TECHNOLOGIES
(COUNTS, SORTS, FILTERING, ORDERING, AND OTHERS).
• TIME AND DATA CRUNCHING: PROPER DATA ANALYSIS TAKES TIME AND EFFORT. THAT SAID, IT
CAN BE DONE FAIRLY QUICKLY AND ACCURATELY, WITH PRACTICE. (KEEP THE CLIENT IN THE LOOP.
PRACTICE. WORK TO IMPROVE.)
81
83. AN OVERVIEW
• A SURVEY DERIVED FROM ONE CREATED AT STANFORD UNIVERSITY AND FOUND ONLINE (BY A
PRIOR TEAM MEMBER); REVISED, LOCALIZED, AND RE-SEQUENCED (TO NEUTRALLY INTRODUCE
TOPICS FOR RESPONDENTS, TO DEFINE TERMS)
• MULTIPLE OBJECTIVES: IMPROVING CUSTOMER SERVICE, INFORMING IT LEADERSHIP, AND
PUBLICIZING IT SERVICES TO THE BROAD PUBLIC
• STRATIFIED RANDOM SAMPLING
• SURVEY PILOT-TESTING AND REVISION
• ACCESSIBILITY ENABLED (TIMED-TEXT TRANSCRIPTION, ALT-TEXTING)
83
84. AN OVERVIEW (CONT.)
• VERBATIM QUOTES / STORIES (FOR LATER USE IN PUBLICATIONS AND PRESENTATIONS)
• WHAT EXPERTS BRING TO THE DATA (SUCH AS KNOWING WHAT IS / IS NOT PROBABLE;
CONTEXTUAL INSIGHTS AND SALIENT POINTS; RELATED IN-FIELD RESEARCH; PROFESSIONAL
PERSPECTIVE, AND OTHERS)
• WHAT NOVICES BRING TO THE DATA (SUCH AS FRESH INSIGHTS AND NEW INTERPRETATIONS;
QUESTIONS)
84
85. SOME LESSONS LEARNED (+)
• BRING IN ALL CLIENTS / STAKEHOLDERS, AND MAKE IT A COLLABORATIVE EFFORT. BE REALLY
AND TRULY OPEN.
• CAPTURE A BROAD VIEW OF THE CAMPUS INFORMATION TECHNOLOGY (IT).
• USE THE SURVEY PARTIALLY TO EDUCATE ABOUT THE WIDE SERVICES AVAILABLE.
• TAKE THE LEARNING FROM THE SURVEY INSTRUMENT, AND LEARN FROM IT, AND IMPROVE
SERVICES.
85
86. SOME LESSONS LEARNED (-) (CONT.)
• AVOID POLITICIZING DATA
• NOT REMOVING NON-RESPONSE RATES ON QUESTIONS FROM CHARTS
• NOT REMOVING NUMBERS OF RESPONSE COUNTS AND GOING WITH PERCENTAGES ALONE (WHICH ARE
MEANINGLESS WITHOUT A COUNT BASELINE)
• NOT SELECTIVELY RELEASING ONLY LITTLE PARTS OF THE SURVEY FINDINGS; NOT FRAMING FINDINGS TO
THE POINT OF WRITING A PRESS RELEASE FOR PUBLIC RELATIONS (PR) INFORMATION
• NOT DISMISSING WHAT ONE DOESN’T WANT TO SEE / HEAR / KNOW / ADDRESS
• NOT RELEASING INFORMATION TO PLEASE ADMINISTRATORS
• KEEP QUESTION AND SURVEY CONSISTENCY IN TERMS OF TRENDLINE DATA
• AVOID QUESTION INCONSISTENCIES YEAR-TO-YEAR
• WHEN QUOTING, QUOTE VERBATIM AND QUOTE COMPREHENSIVELY 86
87. SOME LESSONS LEARNED (-) (CONT.)
• IN THE IT SURVEY FOR FACULTY / STAFF / ADMIN, DO NOT SELECTIVELY OMIT IT PERSONNEL
BECAUSE THEY DO HAVE INSIGHTS THAT SHOULD BE SHARED. THEY SHOULD BE INCLUDED IN THE
FULL SET FROM WHICH THE RANDOM STRATIFIED SAMPLE IS DRAWN.
• DO OMIT RECOGNIZABLE DATA FROM THE RESULTS THAT ARE DRAWN FROM THE SURVEY
RESPONSES (TO PROTECT PRIVACY).
• SAMPLE MORE BROADLY IN ORDER TO BE MORE REPRESENTATIVE IN TERMS OF DATA COLLECTION,
AND BE INCLUSIVE.
• ALLOW SPACE (TECHNOLOGICAL, MENTAL, AND OTHERS) FOR A WIDE RANGE OF POTENTIAL
RESPONSES.
87
88. SOME LESSONS LEARNED (-) (CONT.)
• WORK TO INFORM RESPONDENTS ABOUT WHAT IS GOING ON TECHNOLOGICALLY. PROVIDE
SUFFICIENT DETAILS FOR QUESTIONS. ALLOW INFORMED FEEDBACK.
• USE CORRECT TECHNOLOGICAL PHRASING (TO BE ACCURATE, TO AVOID RIDICULE).
• CREATE OPEN-ENDED QUESTIONS, EVEN AS IT ALLOWS VENTING, INCLUDING SOME F-BOMBS
• ACTUALLY USE THE DATA COLLECTED. JUSTIFY SURVEY TAKER TIME AND EFFORT (10 – 15
MINUTES).
• USE THE PROPER BALANCE TO INCENTIVIZE PARTICIPATION IN SUCH SURVEYS.
• AVOID SKEW IN THE DESIGN. DO NOT DRIVE TRAFFIC TO DESIRED RESPONSES.
88
89. SOME LESSONS LEARNED (-) (CONT.)
• LEARN FROM ITEM ANALYSIS OF THE SURVEY QUESTIONS. RECORD CRITIQUES OF THE SURVEY,
SO THE INSTRUMENT MAY BE IMPROVED FOR THE NEXT ROUNDS.
• KEEP A PRISTINE MASTER SURVEY.
• ARCHIVE SURVEYS FOR WHICH DATA HAS BEEN COLLECTED. DO NOT WRITE OVER A SURVEY
WHICH ALREADY HAS DATA COLLECTED.
89
91. FULL EXPLOITATION
• PURPOSEFUL DATA COLLECTION (AND TRACE AND METADATA): DESIGN THE SURVEY
INSTRUMENTS AND METHODS IN ORDER TO COLLECT AS MUCH RELEVANT DATA AS POSSIBLE.
STAY ATTENTIVE TO WHAT IS KNOWABLE, WHETHER BY INTENTION OR ACCIDENT. STAY
ATTENTIVE TO “TRACE” DATA, INFORMATION THAT IS A BYPRODUCT OF PROCESSES.
• BE AWARE THAT YOU WILL ALWAYS RETRIEVE MORE INFORMATION THAN YOU INTENDED (OR WERE
AWARE OF). SOME OF THAT INFORMATION WILL BE PROFESSIONALLY AND ETHICALLY EXPLOITABLE.
WHAT IS NOT PROFESSIONALLY / ETHICALLY EXPLOITABLE SHOULD BE ARCHIVED SECURELY AND
LEFT ALONE. THERE ARE LIMITS TO EXPLOITATION.
• MULTIPLE INFORMATION STREAMS: CONSIDER HOW TO CROSS-REFERENCE TRUSTED
INFORMATION IN ORDER TO COME UP WITH FRESH INSIGHTS AND GET CLOSER TO “GROUND
TRUTH”. 91
92. FULL EXPLOITATION (CONT.)
• BROADER INTERPRETABILITY AND CONTENT ANALYSIS: ENCOURAGE VARIANT POINTS-OF-
VIEW, DISSENSUS, AND WIDE-RANGING INTERPRETATIONS OF THE DATA. TRY TO DEBUNK THE
WORKING THEORIES. SEE HOW THE EMPIRICAL DATA LINES UP. RETHINK THE “GO-TO”
INTERPRETATIONS. ANALYZE ANOMALIES WITH FRESH INTERPRETATIONS.
• APPLY CONTENT ANALYSIS TO RELATED TEXTS.
• SURFACE INSIGHTS FROM STYLOMETRY.
• VARYING UNITS OF ANALYSIS / GRANULARITY: CONSIDER “ISOLATES” FROM THE DATA,
AND INTERPRET WHAT THESE COULD MEAN. EXTRACT CASES. EXTEND THE DATA BY
DISAGGREGATING PARTS AND PIECES. CONSIDER SYSTEMS AND CONTEXT ISSUES.
92
93. FULL EXPLOITATION (CONT.)
• TRENDS OVER TIME: CONSIDER HOW TO EXPLOIT PRIOR, RELATED SURVEY DATA IN ORDER
MAKE ASSERTIONS OF TRENDS OR CHANGES-OVER-TIME. (THERE IS ALWAYS A TIME ELEMENT
TO DATA.)
93
94. FULL EXPLOITATION (CONT.)
• BROADER WAYS OF KNOWING: CONSIDER OTHER RESEARCH ANALYSIS METHODS AND
WAYS OF KNOWING…SUCH AS SOCIAL NETWORK GRAPHING, MATRIX QUERIES, MAPPING
TO PHYSICAL LOCATIONS, INTRA-RESPONDENT INSIGHTS, DATA MINING, AUTOMATED
(UNSUPERVISED) TEXT ANALYSIS, AND SO ON.
• SEPARATE THE DIFFERENT RESEARCH APPROACHES.
• DOCUMENT ALL STEPS. SAVE ALL INTERMEDIATE VERSIONS OF FILES AND DATASETS.
• MAKE SURE THAT CLARITY AND COHERENCE ARE NEVER LOST.
• BE CLEAR WHERE EVERY PIECE OF DATA BEING USED COMES FROM…AND HOW TRUSTWORTHY
THAT DATA IS. (CONFIDENCE MATTERS. UNDERCLAIM RATHER THAN OVERCLAIM.)
• ADDITIONAL LEADS: PURSUE ADDITIONAL LEADS FROM THE INFORMATION BY FORMULATING
NEW HYPOTHESES AND METHODS FOR RESEARCH. EXPLORE WHAT IS KNOWABLE. 94
95. FULL EXPLOITATION (CONT.)
• SECOND- AND THIRD-ORDER EFFECTS: BE MINDFUL OF IMPLICATIONS OF THE DATA AND
HOW THAT DATA MAY BE PERCEIVED, USED, AND FRAMED. ALL DATA HAVE POLITICAL
IMPLICATIONS, AND IT IS IMPORTANT TO BE AWARE OF THOSE WHILE BEING DISCIPLINED
ENOUGH NOT TO CHANGE UP THE FINDINGS TO FIT A POLITICAL MOTIVE. (WORK BEYOND
THE RESEARCHER’S OWN LIMITS AND SUBJECTIVITY BY PUTTING INTO PLACE CHECKS AND
METHODS…)
• IMPROVED RESEARCH TOOLS AND METHODS: USE THE SURVEY FINDINGS TO SHARPEN
RESEARCH TOOLS (AND METHODS) FOR MORE ACCURATE DATA COLLECTION IN FUTURE
ITERATIONS.
95
96. FULL EXPLOITATION (CONT.)
• GETTING MORE EYES ON THE DATA: IF THE RESEARCH CONTEXT ALLOWS, SHARE THE
DATASET…AND SEE WHAT OTHERS (TRAINED RESEARCHERS AND THE BROADER PUBLIC—
OCCASIONALLY) MAY FIND.
• USE PROPER NAMING PROTOCOLS FOR ALL DATA. MAINTAIN RECORDS AND THE PROPER RELEASES.
ARCHIVAL THE DATA IN A FUTURE-PROOFED WAY FOR POSSIBLE FUTURE RE-USE IN OTHER RESEARCH
CONTEXTS.
• EXPLOIT THE INSIGHTS OF BOTH EXPERTS AND AMATEURS / NOVICES.
• NOT UNETHICAL, UNPROFESSIONAL, UNAPPROVED, PARTICIPANT-UNINFORMED, NON-
COMPLIANT (PER HUMAN SUBJECTS REVIEW RESEARCH STANDARDS) EXPLOITATION OF DATA
96
97. REALITY CHECKS & CAVEATS
• PRINCIPAL INVESTIGATOR (PI) COMFORT LEVELS: MOST RESEARCHERS WILL NOT MOVE BEYOND
WHERE THEY’RE COMFORTABLE IN TERMS OF DATA ANALYSIS AND PROCESSING.
• ALSO, CUSTOMERS FOR THE DATA ARE POLITICAL INDIVIDUALS AND WILL NOT GENERALLY TAKE RISKS
WITH DATA. POLITICAL SURVIVAL AND COMFORT ARE TOP-OF-MIND OFTENTIMES.
• COSTS / REQUIRED INVESTMENTS: ADDITIONAL ANALYTICS WILL REQUIRE HYPOTHESIZING AND
SETTING UP TEST VALUES. ALL TYPES OF DATA PROCESSING AND ANALYSIS REQUIRE NEW WAYS
OF STRUCTURING THE CAPTURED DATA FOR QUERYING, ANALYSIS, AND VISUALIZATION. THE COST
OF ANALYSIS IS GENERALLY NON-TRIVIAL. ADDITIONAL WORK REQUIRES PI INTEREST AND FOCUS.
• RELEVANCE AND APPLICABILITY: IT IS IMPORTANT TO TIE NEW FINDINGS TO APPLICABLE
DECISIONS AND ACTIONS. IT IS IMPORTANT TO GET PAST THE “SO WHAT?”
97
98. REALITY CHECKS & CAVEATS (CONT.)
• CONFERENCES & PUBLISHING VENUES: MOST VENUES FOR PRESENTATION AND PUBLISHING
ACCLIMATE TO CERTAIN TYPES OF DATA ONLY; ANYTHING OUT-OF-THE-ORDINARY WILL
REQUIRE MORE TEXTUAL SETUP AND EXPLANATION (AND DEFENSE). NOVELTY DOES HAVE
SOME INHERENT VALUE THOUGH.
98
99. CONTACT AND CONCLUSION
• DR. SHALIN HAI-JEW, INSTRUCTIONAL DESIGNER, ITAC, K-STATE
• 212 HALE / FARRELL LIBRARY
• 1117 MID-CAMPUS DRIVE NORTH, MANHATTAN, KS 66506-0110
• SHALIN@K-STATE.EDU
• 785-532-5262
THANKS TO DR. SUBRAMANYAM BHADRIRAJU (K-STATE) FOR THE INVITATION TO WORK ON HIS
PROJECT AND HIS GENEROSITY IN ALLOWING ME TO USE SOME PROJECT-RELATED LEARNING
TO INFORM THIS PRESENTATION. THE ACTUAL PRESENTATION IS ACHIEVED WITH LIVE ACCESS
TO VARIOUS SYSTEMS AND DATA; THE ONLINE VERSION HERE LACKS THE VISUALS AND
SEQUENCING FROM THE F2F PRESENTATION. 99