Computational Qualitative Data Analytics

COMPUTATIONAL
QUALITATIVE DATA ANALYTICS
SHALIN HAI-JEW
SIDLIT 2022
JULY 27 – 29, 2022

PRESENTATION BLURB
 Every researcher is a cyborg! Academic researchers engage various sorts of research in vitro (in the glass) and in
vivo (in the living body), or they engage in experimental laboratory work and analyze data in natural in-world
experiments. In between, many conduct surveys, focus groups, interviews, and other types of research work. In
the computer-assisted qualitative data analysis software (CAQDAS) space, NVivo is one of the foremost tools,
enabling the creation of manual codebooks, multimedia analysis, and various forms of “auto” or unsupervised
machine learning. NVivo works as a “database” for structured and unstructured data (multimedia). It enables the
drawing of content from various social media sites. Technologies augment human analytical capabilities, in the
qualitative and quantitative research spaces. This presentation demonstrates some of the capabilities of NVivo.
This also addresses how a researcher is changed by the computational capabilities they harness.
2

DEFINITION: CYBORG
 Cyborg: “A fictional or hypothetical person whose physical
abilities are extended beyond normal human limitations by
mechanical elements built into the body”
 Oxford English Dictionary (2022)
3

SOME BASELINE UNDERSTANDINGS
INVIVO, INVITRO, MIXED METHODS, MULTIMETHODS, CAQDAS, QUANT / QUAL, ETC.
4

INVITRO VS. INVIVO
In vitro (in glass)
 Research that may be conducted “in glass” test tubes
in laboratories
 More typical in the so-called “hard” sciences
In vivo (in living body)
 Research that may be conducted based on “in living
body” or in-world natural experiments (based on
observables, scraped data from real life)
 More typical in the so-called “soft” sciences
 NVivo
5

MIXED METHODSVS. MULTIMETHODOLOGY RESEARCH
Mixed methods research
 A combination of qualitative and quantitative “data,
methods, methodologies, and / or paradigms in a
research study or set of related studies”; a type of
multimethodology research (Multimethodology, Jan.
25, 2022)
Multimethodology research
 Use of “more than one method of data collection or
research in a research study or set of related
studies” (Multimethodology, Jan. 25, 2022)
6

SOME EXAMPLES MULTIMETHODOLOGY (AND MIXED METHODS)
RESEARCH
 An experimental intervention (quant) and a follow-up online survey (qual) (sequential multimethod research)
 A program performance audit based on documentation and data (quant) and interviews (qual) (multi-sourced
data, multimethod research: content analysis, interviews)
 A simulation study (quant) combined with social data and social network analysis (qual) (multimethod and multi-
sourced data)
 Medical trials of a new drug (quant) along with long-term participant health data and surveys (qual) (multimethod
research)
 Longitudinal research combining laboratory-based health data (quant) and surveys (qual) (multimethod research)
 Autoethnography or ethnography (qual) studied in the context of external population data (quant) (multi-sourced
data of both types)
7

RESEARCH (CONT.)
 Scientific research in the lab (quant) combined with external focus groups (or interviews or surveys) (qual)
(multimethod and mixed method research)
 A quasi-experimental learning intervention (quant / qual) with assessment of grade data (quant)
 Learning management system (LMS) data at scale (quant) combined with student surveys (qual) (mixed data)
 Social media data (quant) combined with e-Delphi method study (qual) (mixed data)
 Student grades (quant) and student survey responses (quant / qual) (mixed data)
 Online-based interviews (qual) and sensor data (quant) (multimethods, mixed data sources)
8

RESEARCH (CONT.)
 An oral history project (qual) with computational text analysis (quant / qual) with demographic data (quant)
(mixed data)
 Mapping the state of a nation’s research by bibliometrics (quant / qual) and demographic analysis (quant) and
interviews (qual) (multimethod and mixed method)
 And innumerable other variations
9

CAQDAS: COMPUTER-ASSISTED QUALITATIVE DATA ANALYTICS
SOFTWARE
 computer-assisted qualitative data analysis software (CAQDAS)
 Includes a wide number of software programs, including…
 NVivo
 [data exploration with word frequency counts, text searches; matrix queries; qualitative cross-tab analysis; compound queries; coding
queries; coding similarity analysis; manual coding; codebook export; memo export; reports export; machine learning: topic modeling (with
human researcher in-the-loop), sentiment analysis, speaker coding from transcripts, style coding,“NV” coding based on manual codebook;
data visualizations; manual model drawing; automated model drawing, and others]
 [runs on Windows, Mac, and servers] [some differing capabilities]
10

A LIGHT COMPARISON / CONTRAST BETWEEN QUANT AND QUAL
APPROACHES
Quantitative research
 Epistemological approaches (ways of knowing, ways
of making meaning)
 Assumption of objectivity and absolutism, normal curve
to represent populations
 Striving for high-rigor and reproducible research
 Practical and applied, problem-solving; theoretical
relevance and implications
Qualitative research
 Epistemological approaches (ways of knowing, ways
of making meaning)
 Assumption of subjectivity and relativism on the part of
researchers
 Striving for rich data (coded to saturation)
 Practical and applied, problem-solving; also theoretical
relevance and implications
11

APPROACHES (CONT.)
 Experimental research
 Gold standard is experimental research
 Lab-based
 Field-based
 High-precision measures, highly defined research
methodologies, high rigor
 Natural experiments, field observations,
 Data elicitations through focus groups, interviews
(structured and semi-structured)
 Valuing of voice
 Informants based on positionality
 All content has data value: content analysis, gray
literature, metadata
12

APPROACHES (CONT.)
 Reliance on statistical analysis, descriptive statistics, other
statistical methods, deductive logic
 Independent variable(s), dependent variable(s)
 Controls for potential other influences (noise)
 Evaluate whether p-values justify rejecting null hypotheses
 Use randomization for seating panels, participants, and so on
 Can go with convenience samples, can go with snowball
sampling, and others, but these are weaker sampling methods,
with room for biasing
 Require “power” in terms of numbers for representation
 Reliance on researcher expertise, thematic (and other) coding,
statistical methods
 Can learn from small datasets
 Can learn from an n = 1
 Can learn from individual cases / case studies / groups of cases
 Can make case for a construct based on coding similarity
analysis (using Cohen’s Kappa, Kappa Coefficient)
 Usually a range of .6 to .8 where 1.0 is full agreement of what
is relevant and what is not relevant in the coding
 Need to avoid “reification” (assuming an abstraction has
instantiation in concrete reality),“hallucinated” senses of reality
13

APPROACHES (CONT.)
 Experimental reproducibility and repeatability
 Generalizability of certain standards are met
 Not striving for generalizability but for patterns and
insights
 No assumption of being able to totally recreate a
prior qualitative study
 May do follow-on studies with the “same” population
14

APPROACHES (CONT.)
 Assumption: Interchangeability of similarly trained
researchers
 Integrity required
 Complex skills required
 Assumption: A non-interchangeability of researchers
 Celebration of the researchers’ unique interpretive lens
 Uniqueness of researchers as a strength
 Willingness to challenge status quo, cultural understandings;
be transgressive and revolutionary
 Openness to novel experiences
 Ability to take on challenging work
 Control of own cognitive biases, perceptual slants,
preferential thinking
15

APPROACHES (CONT.)
 Work can be challenged:
 Repeat of the experimental research but with new data
 Finding of errors in the original handling and / or
analysis of the data
 Unclear evidentiary chains
 Finding of logic errors
 Poor methodologies
 Identification of research or other fraud
 Work can be challenged:
 Finding of incorrect application of logic or theory
 Insufficient richness of data
 Researcher biographical bias
 Poor methodologies
 Identification of research or other fraud
16

SOME COMPUTATIONAL DIFFERENCES
 Statistically significant data patterns through…
 Cross-tab analysis
 Factor analysis
 Principal components analysis
 Cluster analysis
 Network analysis, social network analysis, word networks,
related tags networks, and others
 And others
 Focus on natural language analytics
 Spoken, written, mixed
 Various genres and forms
 Harnessing of multimedia, gray literature, various “found”
contents, and others
 Data elicitations using computational means
 Data patterns through…
 Topic modeling
 Sentiment analysis
 Predictive analysis
 Qualitative cross-tab analysis
 Text and data mining 17

SOME COMPUTATIONAL DIFFERENCES (CONT.)
 Machine learning
 Supervised machine learning
 Unsupervised machine learning
 Data modeling from machine learning based on training
data (such as for predictive analytics, with automated
creation of confusion matrices and f-scores)
 Artificial intelligence (AI)-based “experiential” learning
 High performance computing with big data and big data
streams
 Can be applied at scale now
 Can “remember” unique coding fists of unique
researchers and apply their coding computationally
18

COMPUTATIONAL INSTRUMENTATION / TOOLS
AND DIGITAL RESOURCES
 Software programs, code, script, macros
 Curated datasets
 Datasets
 Data models
 Connected script and datasets
 Survey instruments
 Interview instruments
 Manual codebook creation, automated codebook
creation (both coded to saturation)
 Created with top-down coding (based on theory or
framework or model, or some combination; based on pre-
determined research questions; based on a priori
hypothesizing); bottom-up coding (grounded theory); both
top-down and bottom-up coding
 Codebooks named and often with easy-reference acronyms
 .qdc format for digital codebook sharing and heritability,
Microsoft Word or LaTeX formats for appendices
19

COMPUTATIONAL INSTRUMENTATION / TOOLS
AND DIGITAL RESOURCES (CONT.)
 Research journals
 Field notes
 Rubrics
 Matrices
 Checklists
 Coding dictionaries
 Software programs
 Curated datasets
 Research journals
 Field notes
 Rubrics
 Matrices
 Memos
20

QUANTITATIVE AND QUALITATIVE RESEARCH AND DATA ANALYTICS APPROACHES
Overlapping, Interchanging
21

TYPICAL RESEARCH DATA SHARING PRACTICES
 Full dataset (into perpetuity, at the time of publication)
 Data exploitable (in a constructive sense) for other
analyses (but need to cite the creator of the dataset)
 The code used to interact with the data and to create
data visualizations
 Sometimes derived or “shadow” datasets
 De-identified data (privacy protections)
 Canonical collections (image sets, video sets, others) for
further study
 Clear evidentiary chains
 Project files sometimes
 De-identified data (privacy protections)
 Instruments may be shared, like codebooks (manual and
digital) and computer programs
 Partnerships more rare than in quant research teams
22

(IN)CONCLUSIVENESS OF RESEARCH FINDINGS?
 Convergence towards a consensus
 Not fully definitive for all time (may be overturned at
any point with new research)
 No absolute “proof” in most cases but leaning in
certain directions
 Even paradigms shift
 Reproducibility of computational outcomes given the
same dataset and the same queries or autocoding
processes
 Never an absolute last word, but a momentary
provisional observation for a particular point-in-time
 No absolute “proof” in most cases
 Even paradigms shift here, too
 Reproducibility of computational outcomes given the
same dataset and the same queries or autocoding
processes
23

HUMAN SUBJECTS RESEARCH AND STANDARDS
 Professional ethics and regulations / laws that protect the following and more:
 IRB (institutional review board) oversight prior, during, and post research
 Non-use of duplicity except in rare approved-by-IRB cases
 Research value
 Legally procured data
 Research subjects’ well-being
 Informed consent for research subjects, ability to withdraw from the research at any time
 Research subjects’ privacy
 Data preservation
24

COMPUTATIONAL TEXT ANALYSIS FOR QUALITATIVE
RESEARCH DATA ANALYTICS
25

SELECTED QUALITATIVE RESEARCH PRECEPTS
 It helps to have broad and general knowledge along with in-depth focused knowledge. All knowledge can inform
the work.
 There are data everywhere. Everything is datafy-able.
 Everything is culturally informed. Everything is seen through a cultural lens. It helps to be aware of culture, one’s
own and others’.
 The data source may be anywhere from [raw and “found” in-world] to [refined, edited, vetted, and “worked
through”].
26

SELECTED QUALITATIVE RESEARCH PRECEPTS (CONT.)
 With some work, data may be transcoded to information.
 All human creations have potential informational value:
 formal published work, gray literature (brochures), private letters, cultural artifacts, artworks, commenting on social media,
building designs, stamps, candy wrappers, private collections of anything, etc.
 The informational value may differ based on research context and researcher interests. Different researchers will
extract different meanings from the same dataset.
27

 All researchers are subjective. They have built-in biases.They need to be self-aware and control for their own
biases in order to conduct effective research.
 In their work, they need to report on their biases and how they mitigate their biases.
 Different researchers approaching a particular topic will likely take different approaches and emerge with different findings
(to a degree).
 Researchers have their own “coding fists” or “coding hands.” They identify relevant data differently.They create
different coding categories, and these categories may be mutually exclusive or not. They may engage greedy or
frugal coding (whether a coded object can be coded more than once and in different categories).
 Computation enables researcher “coding fists” to be preserved and re-used into the future.
28

 The research findings are not about generalizing to a population per se but about surfacing relevant insights.
 Researchers strive to see differently. One of their “superpowers” is in re-interpreting, at various levels: micro
(ego), meso (group, entity), and macro (larger systems).
 Researchers work across cultures and contexts. They are able to disengage from the context in order to view the situation
analytically.
 Values (stated and implied) are an integral part of the research.
 Qualitative researchers do not assume that the status quo is all as it should be. In qualitative research, advocating
for social change and equity and justice is considered a professional responsibility.
 Studies may be disciplinary or interdisciplinary.
29

 CAQDAS tools support the human researcher.
 The human researcher is foremost in the research and is not displaced by the
technology. However, the human is changed by using technologies, too.
 One graduate student wanted to use autocoding alone for her master’s thesis,
without bringing her own expertise to bear. Not a good idea… Unless you can
create the code for the data analytics informed by your knowledge, a generalized
software tool will output generalized insights.
 CAQDAS enables scalability of various types of computational analytics. For
example, a human-created manual codebook in NVivo can be applied to a
larger dataset and coded with a Cohen’s Kappa coefficient of 1.0. (albeit in a
machine sensibility, not a human one)
30

TEMPORAL LEGACIES OF QUALITATIVE RESEARCHERS
 A lifetime body of work
 Particular research works
 Unique or powerful contributions to particular
insights, theories, practices, and others
 Coining of new terms
 Originating new research methods, how research is
operationalized
 Research instruments
 Ability to reach the lesser-reached
 Language skills
 Professional and other affiliations
 Professional collaborations
 Personality, persona, charisma
 Promotion of social change, advocacy for certain
values
 Effective funding and uses of available resources
 Style(s) and aesthetics
 And others…
31

APPLIED RESEARCH SEQUENCING AND CAQDAS
32

33
A
GENERA
L
COMPU
TATION
AL
ANALYSI
S
SEQUEN
CE OF
MULTIM
ODAL
DATA
FOR
QUALITA
TIVE
DATA
ANALYSI
S

WHAT IS “CODING” ANYWAY?
Manual coding
 Reading collected data (transcripts, articles, maps,
audio, video, photos, etc.) and identifying elements of
interest and coding them to a codebook in natural
language
 Organizing the codebook in a rational order, with
child nodes, grandchild nodes, great grandchild
nodes, etc. (structured codebook)
Automated coding
 Distant reading by machine using the following:
 Word counts
 Algorithmic topic extraction
 Application of sentiment dictionaries to text at varying
levels of granularity (sentences, paragraphs, or data
cells…depending on the formatting of the textual data)
34

WHAT IS A CODEBOOK ANYWAY?
 A basic codebook contains the following: coding nodes (classifications of codes) and descriptions for each node
so that coders understand what information belongs in that classification
 A codebook may be hierarchical, with top-level nodes, child nodes, grandchild nodes, and so forth
 The nodes may be sectioned based on topics. They may be sectioned in alphabetical order.They may be ordered
with leading 0s. There are many accepted ways for the ordering of the codes.
 In table format, a codebook looks like the following (in the simplest construct):
35
Codes Descriptions of Coding Categories

WHY SHOULD A CODEBOOK HAVE A NAME?
 A codebook should have a name that describes what is coded by that codebook. The foci and discipline should
be identifiable.
 A codebook name should have a clear acronym, for easy reference.
 A codebook needs a name so that it is easily citable by other researchers.
 A codebook needs a name so that researchers can credit the original codebook instrument creator when they
use the codebook…or when they create a module to add to it, etc.
 A codebook should have a name because of how it is used in the research and academic space.
 A codebook shows a culmination of expertise…and expert interactions with a sufficient amount of relevant data.
36

THINK IN SEQUENCING
 What are data patterns in a particular set of core data files? (word frequency counts, text searches, topic
modeling, and others)
 What are proxemic terms around particular names, dates, labeled phenomena, symbols, and others? (proximity
searches)
 Who are the different individuals who responded to the survey / focus group / interviews based on demographic
data? Based on topics of interest? Based on general sentiment? (classification sheets w/ demographic
information and case nodes, topic modeling, sentiment analysis, and others)
 What are features of the created manual coding? Automated coding? (matrix coding queries)
 In a sentiment analysis of a social network’s discussion, what topics are seen in the most positive sentiment?
Which topics are seen in the most negative sentiment? (sentiment analysis, topic modeling)
37

THINK IN SEQUENCING (CONT.)
 In conducting a review of the literature, a large number of files have been downloaded from various subscription
and open-source web-facing databases. The research is focused on a particular subset of articles. The researcher
does not want to read all the articles. How can the researcher hone in on the particular works of interest?
(topic modeling by article set; topic modeling by titles and abstracts; word searches in the database of articles)
38

THINK IN SEQUENCING (CONT.)
 In a geographical analysis of responses, what are topical and sentiment patterns and attitudes? (classification
sheets, geographical modeling, topic modeling, sentiment analysis)
 In creating a team’s consensus codebook, based on collected .nvp and .nvpx project files (or even server files),
how do the various human-generated manual codebooks differ? What are the outlier ideas? (event logging for
objective record of individual researcher / coders and contributions, matrix coding query, coding comparison for
Kappa coefficient, transcoding of project files from-toWindows, Mac, server)
 In the “use existing coding patterns” in which “NV” (the software) codes by emulating the human-generated
codebook, various individual’s and teams’ coding fists are emulated computationally…to scale…to computational
speeds. This enables preservation of people’s points-of-view and coding patterns. What are ways to ensure that a
codebook is coded to saturation, since this feature does not add any new nodes (coding classifications)? (use
existing coding patterns)
39

THINK IN MULTIPLE NVIVO FILES
 When working on a large-size or longitudinal project (including doctorate degrees), use a number of files to
achieve your aims.
 Make a file for the review of the literature. Make a file for the focus group. Make a file for the fieldwork. Make a file for
the social video analysis. Make a file for the analysis of the geographical maps.
 Combine data only when you need to run data queries and / or autocoding on the particular set of information.
 Do not clump everything into a large file unless your queries require access to all the included data.
 Always have a backup set of files in the cloud or in multiple physical locations (so as not to lose work
accidentally).
40

THINK IN MULTIPLE LANGUAGES
 NVivo enables coding in a number of languages:
 simplified Chinese
 English (US)
 English (UK)
 French
 German
 Japanese
 Portuguese
 Spanish
 UTF-8 and UTF-16 enables representations of all languages on the Web and Internet.
41

THINK IN TEAMING
 Aim for wide dissensus when originating a team codebook, so that the widest variety of ideas may be captured
initially before there is convergence to a consensus codebook
 Aim for narrower consensus when training a team to use a defined codebook on defined data for a sufficiently
high Cohen’s Kappa / Kappa coefficient to establish the validity of a construct
42

ABOUT NVIVO
 NVivo is a qualitative data analytics software tool that acts like a database (that enables the storage of structured
and unstructured data, the running of queries, the interaction with data, the drawing of data visualizations, the
export of reports, and so on)
 The prior version of the software was known as NUD*IST (1981 – 1997), and N4 to NVivo from 1997 to
present (“NVivo,” Oct. 14, 2021)
 NUD*IST stood for “Non numerical Unstructured Data Indexing Searching and Theorizing software”
44

BASIC SELECTED ANALYTICAL CAPABILITIES OF NVIVO INCLUDE…
 Exploration of data
 Word frequency count
 Text search with various parameters
 Similarity cluster analysis
 Coding analysis
 Matrix coding
 Qualitative crosstab analysis (with case nodes and
classification sheet data)
 Coding comparison (with Cohen’s Kappa / Kappa
coefficient)
 Compound queries
 Group queries
 Locational geographical mapping from social media
data
 Ego neighborhood mapping for following network in
directed graphs
 Various data tables
 Various data visualizations (dendrograms, treemap
diagrams, word trees, ring lattices, cluster diagrams
2d, cluster diagrams 3d, and others)
45

BASIC SELECTED ANALYTICAL CAPABILITIES OF NVIVO
INCLUDE…(CONT.)
 Autocoding from data (various forms of machine
learning)
 Topic modeling (“distant reading” of texts and
extraction of topics)
 Sentiment analysis
 Coding by style
 Coding by name in transcript
 Use of existing coding patterns (machine copies human
manual codebook to dataset scale and computation
speed)
 Autocoding from survey downloads (to case nodes
and topic modeling and sentiment analysis)
46

SOME SCREENSHOTS FROM AN ORIGINAL DEMO
PROJECT
FROM NVIVO 12 (ONEVERSION PRIOR TO LATEST) AND NVIVO (LATESTVERSION) ON WINDOWS
47

SOME ANALYTICAL APPLICATIONS OF NVIVO IN THE RESEARCH
LITERATURE
 Manual coding of various research data and the extraction of manual codebooks (based on a variety of target
topics)
 Reproducible / repeatable autocoded topic modeling to compare against human coding
 Autocoded sentiment analysis (positive or negative sentiment) of text sets
 Respondent profiling by topics of focus and sentiment
 Codebook analysis (analysis of the code, whether manual or autocoded or combined)
 Qualitative cross-tab analysis for data patterns of respondents by various attributes (demographic and others)
 Social media data extractions (tweetsets from a microblogging site, poststreams from a social networking site,
social video from a social video sharing site with comments, and so on)
59

STRUCTUREDVS. UNSTRUCTURED / SEMI-STRUCTURED DATA
Structured data
 Labeled data in data tables
 Each value in a cell is labeled by the column header and
the row header
 Each value in a cell is identified by type of data with
attendant features
Unstructured, semi-structured data
 Text
 Imagery
 Audio
 Video
 Multimodal, multimedia-based
 * The argument for “semi-structured” vs.“unstructured”
is that there is no absolutely unstructured data unless
it’s randomness (even pseudo-randomness is not fully
unstructured). Natural language has an inherent
structure. Ditto storytelling, audio, video, and so on.
61

TYPES OF USABLE DATA IN NVIVO
 Text files (incl. pdf)
 Image files (maps, screenshots, photos, diagrams, and
others)
 Audio files
 Video files
 Survey data (Qualtrics, Survey Monkey)
 Web bibliography sources
 Online notetaking sites
 Email message and identity data
 Excel workbooks
 SPSS datasets (note the tie to quant methods from a
qual analytics tool…and vice versa)
 NVivo projects (for team collaborations)
 .qdc codebooks, .docx codebooks
 (code category names and descriptions for what goes into
each category, not exemplars within the categories)
 NVivo memos, NVivo reports, and others
 * For multimedia, there have to be text equivalencies for
the imagery, audio, and video (transcripts)
62

USING CLEAN DATA
 To have clean data, select the files purposefully.
 Take out personally identifiable information (PII).
 Ensure that metadata does not carry sensitive information (in the imagery, in the text files, in the video files, etc.)
 Do not digitally annotate the files before you ingest those into the NVivo project, or you’ll have introduced noise
into your data. (If you want to annotate files, do so, but keep those files separate from the pristine ones that will
be ingested into the .nvp or .nvpx files.)
63

ENSURING USABLE DATATABLES AND DATAVISUALIZATIONS
OUTSIDE OF NVIVO
 It helps to export data tables and data visualizations from NVivo unless you will have a forever license and assume that
the software will be forever available. NVivo is a proprietary software, and you will need a version of the software to
open NVivo files.
 An older version of NVivo cannot open newer .nvp or .nvpx files. Upgrading files will mean that the upgraded version of the
software is needed.
 Record the exported contents clearly with consistent file-naming protocols. Record the parameters used to extract the data table
or data visualization. [Review the data. Do “sanity checks” of the data before exporting and saving.]
 Just make sure that you have clear naming protocols for any exported data, so you know what you’re looking at when
you access the files later. Document the parameters you use to run various data queries and machine learning
sequences, so you can represent them clearly in a publication or presentation. [The data analytics process is sequence-
sensitive. The order of operations affects data at each step and the ultimate outcomes. Error introduced at any one
point potentially amplifies.]
 Any raw data you ingest into an NVivo project should also be stored externally to the project as well, so they are
available for reference external to the project file.
64

DATA EXTRACTIONS FROM SOCIAL MEDIA
USING NCAPTURE (WEB BROWSER ADD-ON TO GOOGLE CHROME)
65

SOCIAL MEDIA DATASETS
 Profiling social groups and sub-groups
 Capturing a sense of mass mood / sentiment around particular topics
 Identifying the most high-degree social nodes in a social network; mapping the social network to understand
dynamics
 Mapping http networks from social media
 Analyzing social images on social media (for content, for sentiment, for identified peoples, and others)
 Identifying synthetic persons (‘bots)
 Identifying general geographical locations of respective linked social accounts
66

ABOUT SOCIAL MEDIA DATA AND NCAPTURE
 NCapture works on Google Chrome (and the aging-out unsupported Internet Explorer / IE web browser)
 Various social media platforms are not supported in IE now, so using NCapture does not enable access to the
various platforms, like Facebook / Meta
 Developers are working on a bridge to Facebook via Chrome, but that has not been available for many months
 Access to social media data is rarely an n = 1 (without paying for the data from the social media platform
provider or a third-party source)
 Given dynamism in the space (due to various dependencies and other factors), if you have a chance to collect the
social data, do so. Do not assume that the chance will always be there.
 Take a screenshot of the landing page of the social account you’ve profiled, so you have the “state” of the account
at the time of the data capture. (You may have to go deeper for more than summary data.)
 You can scrape images using third-party web browser add-ons to capture that data in thumbnail format.
67

LIMITS TO COMPUTATIONALTEXT ANALYSIS IN NVIVO (IMHO)
 Some limits:
 may view words as individual n-grams and not bigrams, three-grams, four-grams, etc.; does not capture phrases
 may have an insufficient stopwords list
 does not understand negatives
 does not understand humor
 does not understand irony
 does not understand external referents to a text or text corpus
 machine logic and not human logic in “use existing coding patterns” machine emulation of human coding
 is limited by human manual coding when using “use existing coding patterns”
 does not capture an n = all in social media accounts with NCapture (unless the accounts have a limited amount ofTweet or
poststream contents) … given API (application programming interface) limits on the various social media platforms
 There are commercial ways to acquire n = all datasets but require queries run online on “big data” datasets (using different data query methods like
versions of structured query language)
68

THE CYBORG RESEARCHER(S)
PART HUMAN, PART MACHINE
69

INDIVIDUALS AS RESEARCH INSTRUMENTS…
 An individual researcher is a “research instrument”.
 A group of researchers is a collaborative “research instrument”.
 Researchers are sentients, and their aperture and vision and methods inform their power and capability.
 Their social connections are part of their power and capability.
 Their positionality—which they can change—affects what they have access to and what they can achieve.
70

CYBORGS HAVE MORE SKILLS TO DEPLOY
 The building up of new knowledge and new skills in the computational space can enable an extension of the
researcher capabilities.
 A “cyborg” is a “bionic” personage, who is both flesh and machine (as technical enhancement).
 CAQDAS and other data analytics software have a forcing function: they force a researcher to get more precise
and to explicate and to explore and to ultimately form a sense of the target research topic.
 Technologies change the researcher. [Some see this as a strength. Others see this as a threat. People choose how to
wield and apply certain tools.]
71

THE “BIONICS” FORTHE RESEARCHER INCLUDE THE
FOLLOWING…
 Technologies enable the capture of otherwise-inaccessible data, in vitro, in vivo, and in cyber. They extend collection.
 Technologies enable various discovery and explorations of the available data.Technologies enable rich review of
data. They extend perception.
 Technologies enable permanent archival of data. They extend memory.
 Technologies enable expanding askable research questions…and the testing of various hypotheses.They extend
asking and hypothesizing. They extend thinking.
 Technologies can enable complementary insights to those attained by manual methods alone. They extend
learning. They extend conceptualization.
72

“BLACK BOX” ELEMENTS IN RESEARCH AND DATA ANALYSIS
 There are some “black box” elements to both human and the
computational machines and methods.
 For as much effort that goes into transparency in research and
computational sequences, there are “inexplicables” (as in ANNs
and how neural networks process data—although computer
scientists are getting closer to some explanations; as in human
intuitions and cognitive leaps; as in workings of the human
subconscious and unconscious).
 Then again, not everything has to be fully understood and
explicated.
73

GETTING STARTED WITH COMPUTATIONAL
QUALITATIVE DATA ANALYSIS
74

GETTING STARTEDWITH CAQDAS
 CAQDAS is Computer-Assisted Qualitative Data Analytics Software.
 Study how computation is applied to various qualitative data analytics challenges…and what may be asserted from
various analytics.
 Explore the available software tools and their respective capabilities.
 Decide which software programs provide the capabilities you will use.
 Decide which software tools have a comfortable user interface.
 There are some free software tools available, if you’re comfortable with command line (vs. graphical user interface).
 Go for it! Start slow. Be nice to yourself. Be nice to others. Build the skillset. Share your knowledge, skills, and
abilities (KSAs).
75

CONTACT
 Dr. Shalin Hai-Jew
 ITS
 Kansas State University
 shalin@ksu.edu
 785-532-5262
 Gentle caveat: This presentation uses one software tool to bridge to CAQDAS. There are many other tools and
methods and capabilities…
 Resource: Using NVivo: An Unofficial and Unauthorized Primer
76

EXTRA: AND A DIFFERENT FAVORITE CAQDAS TOOL: LIWC-22
 LIWC-22*
 [validated instrument trained on a number of natural language datasets; measures psychometrics; linguistic features; sentiment analysis,
four category scale scores related to (1) analytical thinking, (2) clout, (3) authenticity / warmth, (4) emotional tone (sentiment); and other
foci; enables custom dictionaries focused on various objectives; has versions in a number of different languages]
 [related to a variety of insightful research]
 [runs on Windows]
77

EXTRA: SOME ANALYTICAL APPLICATIONS OF LIWC INTHE
RESEARCH LITERATURE
 Author identification (historical and present), (in)validation of authorship
 Predictive analytics
 Fraud detection (people and “self-invisible” tells)
 Suicide intervention
 Remote personality (ego and entity) profiling (longitudinal, episodic)
 Political leader profiling
 Political group profiling
 Ideology profiling
 Human landscape mapping, social network mapping
 Elicitation of power dynamics
 Terrorist group mapping
78

Computational Qualitative Data Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Computational Qualitative Data Analytics

Similar to Computational Qualitative Data Analytics (20)

More from Shalin Hai-Jew

More from Shalin Hai-Jew (20)

Recently uploaded

Recently uploaded (20)

Computational Qualitative Data Analytics