Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Poster: Digital Qualitative Codebook
1. In qualitative data analytics, computation is seen to bolster and complement the work of the human
researcher
Data may include structured (labeled table data) and unstructured / semi-structured data (texts, imagery,
audio, video, multimedia, and others)
Data used in CAQDAS tools are human-curated and human-cleaned (prepared)
How the data are emplaced in the software affects what questions may be asked of the data
It is important to keep clear records of the data analytics conducted on the data and the applied
parameters
Supervised machine learning uses labeled data to code unlabeled data
Unsupervised machine learning does not use human-coded data
QDA and CAQDAS tools enable some reproducibility of information (a value from quant approaches)
The software enables data exploration (frequency word counts, text trees, matrices, code analyses, coding
comparisons, qualitative crosstab analyses, compound queries, and others
The software enables sentiment analysis, topic modeling, social network analysis, similarity analyses
The software enables various data visualizations (treemap diagrams, 3d bar charts, cluster diagrams in 3d,
cluster diagrams in 2d, dendrograms, treemap diagrams, network graphs, lattice graphs, and others)
Much of the computational analysis is text-based, so multimedia have to be transcripted or transcoded into
text form
Natural language is a key component of QDA and CAQDAS tools
The text analyses are often multilingual (whatever is enabled through UTF-8 and UTF-16)
One superpower in at least one CAQDAS tool is to have a human-created codebook made with examples of
text coded sufficiently to each category and then to assign the software to copy the coding on uncoded text
Researchers want to avoid “reifying” constructs or seeing phenomena that may not exist in the real (given
human cognitive biases and limits)
It is possible to compare the coding between individuals and teams (with a computed Kappa Coefficient /
Cohen’s Kappa)
In a context in which there is a shared codebook and shared data, and the coders are similarly trained,
having a Kappa coefficient of .6 to .8 may be sufficient agreement to validate a particular construct (on a
scale where 0 is no-agreement in the coding and 1 is full agreement in the coding)
Different disciplines have different standards for the Kappa coefficient
Codebooks may also be released in electronic format (such as REFI-QDA or .qdc format), which enables
interoperability between different Qualitative Data Analysis Software (QDA) or Computer-Assisted
Qualitative Data Analysis Software (CAQDAS) programs
REFI = Rotterdam Exchange Format Initiative
.qdc = the Codebook Exchange Format
What transfers is only the code name and directions for what to code to that category
Codes should be generally mutually exclusive
Depending on the coding rules, text may be coded into multiple categories if that text is multi-dimensional
The syntax of a codebook should be fairly consistent
A codebook should be useful across a range of contexts to be useful to others (to be heritable)
For citation purposes, codebooks should have a clear name and a memorable and non-offensive acronym
As computation advances, perhaps codebooks will have additional capabilities
(All images in this poster are by the Deep Dream Generator, 2023)
Digital Qualitative Codebooks
There is informational value to everything; it’s a matter of how to look at a thing to
extract informational value
Qualitative research involves people
Unavoidably, the human researcher and coder brings themselves to the work
Data analytics is necessarily subjective
Qualitative research is socially informed; positionality matters
Informants know information that others do not
Researchers need to acknowledge biases and go from there
Meaning and sensemaking are often made in socially constructive ways
Every researcher has a unique “coding fist” that reflects who they are and how they’re trained
Qualitative researchers have a long history of manual coding in the analog space (paper and sticky notes,
whiteboards, etc.)
Top-down coding is based on theory, framework, model, particular research questions, etc.
Bottom-up coding involves coding data based on grounded theory (without an over-arching approach);
the coder reviews the data to see what emerges as important
The coding may begin with a draft codebook or without
Coding may involve both top-down and bottom-up coding
Coding to “saturation” involves capturing all possible relevant ideas from the target space and constructs
(in available collected data)
Essentially, a codebook is comprised of a set of codes (categories of information) and directions for what
information goes into each category
A codebook may have upper level categories and subcategories and sub-sub categories (or top-level
nodes, child nodes, grandchild nodes, great grandchild nodes, and so forth)
Codebooks may be made by individuals
Codebooks may be consensus ones and made by groups or teams
Codebooks may be computationally created
Codebooks (as instruments) are often published out with the academic article or chapter (such as in the
appendices)
Codebooks should have a clear representative name (and a smooth acronym)
Basic tenets of qualitative research
Validating constructs qualitatively and computationally
Basic tenets of qualitative data analytics (with computation)
Dr. Shalin Hai-Jew
Kansas State University
About qualitative coding
Sharing codebooks electronically
What is a codebook?