Every researcher is a cyborg! Academic researchers engage various sorts of research in vitro (in the glass) and in vivo (in the living body), or they engage in experimental laboratory work and analyze data in natural in-world experiments. In between, many conduct surveys, focus groups, interviews, and other types of research work. In the computer-assisted qualitative data analysis software (CAQDAS) space, NVivo is one of the foremost tools, enabling the creation of manual codebooks, multimedia analysis, and various forms of “auto” or unsupervised machine learning. NVivo works as a “database” for structured and unstructured data (multimedia). It enables the drawing of content from various social media sites. Technologies augment human analytical capabilities, in the qualitative and quantitative research spaces. This presentation demonstrates some of the capabilities of NVivo. This also addresses how a researcher is changed by the computational capabilities they harness.
Creating Seeding Visuals to Prompt Art-Making Generative AIsShalin Hai-Jew
Art-making generative AIs have come to the fore. A basic work pipeline typically involves starting with text prompts -> generated images. That image may be used to seed further iterations. Deep Dream Generator (DDG) enables the application of “modifiers” of various types (artist styles, visual adjectives, others) to be applied in addition to the text prompt.
Another approach involves beginning with a “seeding image,” a born-digital or digitized (born-analog) visual on which AI-generated art may be based for a multi-channel and multi-modal prompt. This slideshow provides some observations of how to think about seeding images, particularly in terms of how the DDG handles them, with its “algorithmic pareidolia” (“Deep Dream,” Wikipedia, July 3, 2023).
Human art-making is often about throwing mass-scale conversations. Artists are thought to help bridge humanity into the future. Whether generative AI art enables this or not is still not clear.
LIWC-ing at Texts for Insights from Linguistic PatternsShalin Hai-Jew
Since the mid-1990s, researchers have been using the Linguistic Inquiry and Word Count (LIWC pronounced “luke”) software tool to explore various text corpora for hidden insights from linguistic patterns. The LIWC tool has evolved over the years. Simultaneously, research using computational text analysis has evolved and shed light on areas of deception, threat assessment, personality, predictive analytics, and other areas. This presentation will highlight some of the applications of LIWC in the research literature and showcase the tool on some original text sets.
Exploring the Deep Dream Generator (an Art-Making Generative AI) Shalin Hai-Jew
The Deep Dream Generator was created by Google engineer Alexander Mordvintsev in 2014. It has a public facing instance at https://deepdreamgenerator.com/, which enables people to use text prompts and image prompts (individually or in combination) to inspire the art-generating generative AI to output images. This work highlights some process-based walk-throughs of the tool, some practical uses, some lightweight art learning, some aspects of the online social community on this platform, and other insights. Some works by the AI prompted by the presenter may be seen here: https://deepdreamgenerator.com/u/sjjalinn.
(This is the first draft of a slideshow that will be used in a conference later in the year.)
Creating Seeding Visuals to Prompt Art-Making Generative AIsShalin Hai-Jew
Art-making generative AIs have come to the fore. A basic work pipeline typically involves starting with text prompts -> generated images. That image may be used to seed further iterations. Deep Dream Generator (DDG) enables the application of “modifiers” of various types (artist styles, visual adjectives, others) to be applied in addition to the text prompt.
Another approach involves beginning with a “seeding image,” a born-digital or digitized (born-analog) visual on which AI-generated art may be based for a multi-channel and multi-modal prompt. This slideshow provides some observations of how to think about seeding images, particularly in terms of how the DDG handles them, with its “algorithmic pareidolia” (“Deep Dream,” Wikipedia, July 3, 2023).
Human art-making is often about throwing mass-scale conversations. Artists are thought to help bridge humanity into the future. Whether generative AI art enables this or not is still not clear.
LIWC-ing at Texts for Insights from Linguistic PatternsShalin Hai-Jew
Since the mid-1990s, researchers have been using the Linguistic Inquiry and Word Count (LIWC pronounced “luke”) software tool to explore various text corpora for hidden insights from linguistic patterns. The LIWC tool has evolved over the years. Simultaneously, research using computational text analysis has evolved and shed light on areas of deception, threat assessment, personality, predictive analytics, and other areas. This presentation will highlight some of the applications of LIWC in the research literature and showcase the tool on some original text sets.
Exploring the Deep Dream Generator (an Art-Making Generative AI) Shalin Hai-Jew
The Deep Dream Generator was created by Google engineer Alexander Mordvintsev in 2014. It has a public facing instance at https://deepdreamgenerator.com/, which enables people to use text prompts and image prompts (individually or in combination) to inspire the art-generating generative AI to output images. This work highlights some process-based walk-throughs of the tool, some practical uses, some lightweight art learning, some aspects of the online social community on this platform, and other insights. Some works by the AI prompted by the presenter may be seen here: https://deepdreamgenerator.com/u/sjjalinn.
(This is the first draft of a slideshow that will be used in a conference later in the year.)
Ethical Considerations of a Qualitative Research N. Mach
Ethical Considerations can be specified as one of the most important parts of the research. ... Research participants should not be subjected to harm in any way whatsoever. Respect for the dignity of research participants should be prioritized. Full consent should be obtained from the participants prior to the study. (Research Methodology)
My presentation at The Richmond Data Science Community (Jan 2018). The slides are slightly different than what I had presented last year at The Data Intelligence Conference.
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
Virginia Dignum – Responsible artificial intelligenceNEXTConference
As Artificial Intelligence (AI) systems are increasingly making decisions that directly affect users and society, many questions raise across social, economic, political, technological, legal, ethical and philosophical issues. Can machines make moral decisions? Should artificial systems ever be treated as ethical entities? What are the legal and ethical consequences of human enhancement technologies, or cyber-genetic technologies? How should moral, societal and legal values be part of the design process? In this talk, we look at ways to ensure ethical behaviour by artificial systems. Given that ethics are dependent on the socio-cultural context and are often only implicit in deliberation processes, methodologies are needed to elicit the values held by designers and stakeholders, and to make these explicit leading to better understanding and trust on artificial autonomous systems. We will in particular focus on the ART principles for AI: Accountability, Responsibility, Transparency.
"Introduction to Research Methodology" covers various aspects of research methodology. The presentation begins with an introduction to research methodology, highlighting its systematic approach and techniques for conducting research. It emphasizes the importance of research methodology in ensuring valid, reliable, and credible research findings. The key components of research methodology, such as research design, sampling techniques, data collection methods, and data analysis techniques, are discussed in detail. The presentation explores quantitative and qualitative research methods, as well as mixed methods research, which combines both approaches. Ethical considerations in research, including informed consent, confidentiality, and protection of participants' rights, are emphasized. The concepts of validity and reliability in research are explained, stressing their significance in ensuring accurate and consistent results. The role of research ethics committees, also known as institutional review boards (IRBs), in overseeing ethical research practices is highlighted. The presentation concludes by underscoring the importance of effective research reporting, emphasizing the need for clear and structured research reports to share research outcomes with the scientific community and a wider audience.
This paragraph provides a comprehensive summary of the content covered in the 15 slides of the "Introduction to Research Methodology" presentation. Let me know if you have any further questions or need additional assistance!
Ethical Considerations of a Qualitative Research N. Mach
Ethical Considerations can be specified as one of the most important parts of the research. ... Research participants should not be subjected to harm in any way whatsoever. Respect for the dignity of research participants should be prioritized. Full consent should be obtained from the participants prior to the study. (Research Methodology)
My presentation at The Richmond Data Science Community (Jan 2018). The slides are slightly different than what I had presented last year at The Data Intelligence Conference.
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
Virginia Dignum – Responsible artificial intelligenceNEXTConference
As Artificial Intelligence (AI) systems are increasingly making decisions that directly affect users and society, many questions raise across social, economic, political, technological, legal, ethical and philosophical issues. Can machines make moral decisions? Should artificial systems ever be treated as ethical entities? What are the legal and ethical consequences of human enhancement technologies, or cyber-genetic technologies? How should moral, societal and legal values be part of the design process? In this talk, we look at ways to ensure ethical behaviour by artificial systems. Given that ethics are dependent on the socio-cultural context and are often only implicit in deliberation processes, methodologies are needed to elicit the values held by designers and stakeholders, and to make these explicit leading to better understanding and trust on artificial autonomous systems. We will in particular focus on the ART principles for AI: Accountability, Responsibility, Transparency.
"Introduction to Research Methodology" covers various aspects of research methodology. The presentation begins with an introduction to research methodology, highlighting its systematic approach and techniques for conducting research. It emphasizes the importance of research methodology in ensuring valid, reliable, and credible research findings. The key components of research methodology, such as research design, sampling techniques, data collection methods, and data analysis techniques, are discussed in detail. The presentation explores quantitative and qualitative research methods, as well as mixed methods research, which combines both approaches. Ethical considerations in research, including informed consent, confidentiality, and protection of participants' rights, are emphasized. The concepts of validity and reliability in research are explained, stressing their significance in ensuring accurate and consistent results. The role of research ethics committees, also known as institutional review boards (IRBs), in overseeing ethical research practices is highlighted. The presentation concludes by underscoring the importance of effective research reporting, emphasizing the need for clear and structured research reports to share research outcomes with the scientific community and a wider audience.
This paragraph provides a comprehensive summary of the content covered in the 15 slides of the "Introduction to Research Methodology" presentation. Let me know if you have any further questions or need additional assistance!
Qualitative data analysis: many approaches to understand user insightsAgnieszka Szóstek
The fifth lecture at HITLab, Canterbury University in New Zealand was all about how important it is to run a proper analysis of the qualitative data. We discussed the value in looking at data from individual (phenomenological) perspective versus combined (reductionist) perspective. But we agreed that regardless of the chosen approach it is crucial to look at the data from more than just one perspective to be sure the interpretation is not biased by researcher's on view of the world.
slides will make you understand current issues of nursing research and envisioning future scope or you say journey from nursing research to nurse scientist
Tools and techniques in qualitative and quantitative researchDeepikakohli10
The presentation is about different Tools and techniques used for Research. It will help students, teachers, researchers and teacher educators to select appropriate tools and techniques for their research purpose.
Long nonfiction chapters are not in-style and may never have been. Where average chapter lengths of nonfiction book chapters are about 4,000 – 7,000 words in length, some may be several times that max range number. The explanation is that there is some irreducible complexity that that chapter addresses that cannot be addressed in shorter form. This slideshow explores some methods for writing longer chapters while still maintaining coherence, focus, and reader interest…and while using some technological tools to write and edit more efficiently.
Overcoming Reluctance to Pursuing Grant Funds in AcademiaShalin Hai-Jew
Starting as an organization’s new grant writer can be a challenge, especially in a case where there has been a time lapse since the last one left. People get out of the habit of pursuing grant funds. This slideshow addresses some of the reasons for such reluctance and proposes some ways to mitigate these.
Writing grants is one common way that those in institutions of higher education may acquire some funds—small and big, one-off and continuing—to conduct research, hire faculty and researchers and learners and others, update equipment, update or build up new buildings, and achieve other work. This slideshow explores some aspects of the work of grant writing in the present moment in higher education.
Contrasting My Beginner Folk Art vs. Machine Co-Created Folk Art with an Art-...Shalin Hai-Jew
The SARS-CoV-2 pandemic inspired several years of experimentation with common or folk art, involving mixed media, alcohol ink painting, and other explorations. Then, with the emergence of art-making generative AIs, there were further experiments, particularly with one that enables generation of visuals from scanned art and photos, text prompts, style overlays, and text-based visual modifiers. While both types of artmaking are emotionally satisfying and helpful for stress management, there are some contrasting differences. This exploratory slideshow explores some of these differences in order to partially shed light on the informal usage of an art-making generative AI (artificial intelligence).
Common Neophyte Academic Book Manuscript Reviewer MistakesShalin Hai-Jew
The work of academic book reviewing, as a volunteer (most often), is a common academic practice. The presenter has served as a neophyte one for some years before settling into this invited volunteer work for several decades. There have been lessons learned over time about avoidable mistakes…from both experience and observation.
Fashioning Text (and Image) Prompts for the CrAIyon Art-Making Generative AIShalin Hai-Jew
CrAIyon (formerly DALL-E after Salvador “Dali”) is a web-facing art-making generative AI tool online (https://www.craiyon.com/) that enables the uses of text (and image) prompts for the creation of watermarked, lightweight visuals. Counterintuitively, the rough visuals are much more usable for recombinations and remixes and recreations into usable digital visuals for various digital learning objects. The textual prompts are not particularly intuitive because of how the generative AI program was trained on mass-scale visuals). There is an art and occasional indirection to working prompts after each try, with the resulting nine-image proof sheets that CrAIyon outputs. The tool can be used iteratively for different outputs.
The tool sometimes turns out serendipitous surprises, including an occasional work so refined that it can be used / shared almost unedited. One challenge in using CrAIyon comes from their request for credit (for all non-subscribers to their service). Another comes from the visual watermarking (orange crayon at the bottom right of the image). However, this tool is quite useful for practical applications if one is willing to engage deep digital image editing (Adobe Photoshop, Adobe Illustrator).
Augmented Reality in Multi-Dimensionality: Design for Space, Motion, Multiple...Shalin Hai-Jew
Augmented reality (AR)—the use of digital overlays over physical space—manifests in a wide range of spaces (indoor, outdoor; virtual) and ways (in real space (with unaided human vision); in head gear; in smart glasses; on mobile devices, and others). There are various authoring technologies that enable the making of AR experiences for various users. This work uses a particular tool (Adobe Aero®) to explore ways to build AR for multiple dimensions, including the fourth dimension (motion, changes over time).
Based on the respective purposes of the AR experience, some basic heuristics are captured for
space design (1),
motion design (2),
multiple perception design (sight, smell, taste, sound, touch) (3),
and virtual- and tangible- interactivity (4).
Some Ways to Conduct SoTL Research in Augmented Reality (AR) for Teaching and...Shalin Hai-Jew
One of the extant questions about augmented reality (AR) is how (in)effective it is for the teaching and learning in various formal, nonformal, and informal contexts. The research literature shows mixed findings, which are often highly context-based (and not generalizable). There are some non-trivial costs to the design/development/deployment of AR for teaching and learning. For the users, there is cognitive load on the working memory [(1) extraneous/poor design, (2) intrinsic/inherent difficulty in topic, and (3) germane/forming schemas]. For teachers, there are additional knowledge, skills, and abilities / attitudes (KSAs) that need to be brought to bear.
Augmented Reality for Learning and AccessibilityShalin Hai-Jew
Recently, the presenter conducted a systematic review of the academic literature and an environmental scan to learn how to set up an augmented reality (AR) shop at an institution of higher education. The ambition was to not only set up AR in an accessible and legal way but also be able to test for potential +/- effects of AR on teaching and learning. The research did not go past the review stage, because of a lack of funding, but some insights about accessibility in AR were acquired.
(The visuals are from Deep Dream Generator and CrAIyon.)
Engaging Pixabay as an open-source contributor to hone digital image editing,...Shalin Hai-Jew
This slideshow describes the author's early experiences with creating two accounts on Pixabay in order to advance digital editing skills in multimedia. The two accounts are located at https://pixabay.com/users/sjjalinn-28605710/ and https://pixabay.com/users/wavegenerics-29440244/ ...
This work explores four main spaces where researchers publish about educational technology: academic-commercial, open-access, open-source, and self-publishing.
Human-Machine Collaboration: Using art-making AI (CrAIyon) as cited work, o...Shalin Hai-Jew
It is early days for generative art AIs. What are some ways to use these to complement one's work while staying legal (legal-ish)?
Correction: .webp is a raster format
Getting Started with Augmented Reality (AR) in Online Teaching and Learning i...Shalin Hai-Jew
University creative shops are exploring whether they can get into the game of producing AR-enhanced experiences: campus tours, interactive gaming, virtual laboratories, exploratory art spaces, simulations, design labs, online / offline / blended teaching and learning modules, and other AR applications.
This work offers a basic environmental scan of the AR space for online teaching and learning, and it includes pedagogical design leads from the current research, technological knowhow, hands-on design / development / deployment of learning objects, and online teaching and learning methods.
Co-Creating Common Art with the CrAIyon AIShalin Hai-Jew
This slideshow contains a variety of images created using the CrAIyon AI...based on seeding terms. This work asks questions about common art in an age of AI.
This is the revised intro to Adobe Animate set of notes used in a training in late June 2022. The Word version is downloadable from www.k-state.edu/ID/AdobeAnimateHandout.docx, with the motion available from the animated .gifs.
"Drift" is the latest in the alcohol ink drip playing series. After reaching the first learning plateau a year and a half in, I am finding second wind. This is all still fun.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
2. PRESENTATION BLURB
Every researcher is a cyborg! Academic researchers engage various sorts of research in vitro (in the glass) and in
vivo (in the living body), or they engage in experimental laboratory work and analyze data in natural in-world
experiments. In between, many conduct surveys, focus groups, interviews, and other types of research work. In
the computer-assisted qualitative data analysis software (CAQDAS) space, NVivo is one of the foremost tools,
enabling the creation of manual codebooks, multimedia analysis, and various forms of “auto” or unsupervised
machine learning. NVivo works as a “database” for structured and unstructured data (multimedia). It enables the
drawing of content from various social media sites. Technologies augment human analytical capabilities, in the
qualitative and quantitative research spaces. This presentation demonstrates some of the capabilities of NVivo.
This also addresses how a researcher is changed by the computational capabilities they harness.
2
3. DEFINITION: CYBORG
Cyborg: “A fictional or hypothetical person whose physical
abilities are extended beyond normal human limitations by
mechanical elements built into the body”
Oxford English Dictionary (2022)
3
5. INVITRO VS. INVIVO
In vitro (in glass)
Research that may be conducted “in glass” test tubes
in laboratories
More typical in the so-called “hard” sciences
In vivo (in living body)
Research that may be conducted based on “in living
body” or in-world natural experiments (based on
observables, scraped data from real life)
More typical in the so-called “soft” sciences
NVivo
5
6. MIXED METHODSVS. MULTIMETHODOLOGY RESEARCH
Mixed methods research
A combination of qualitative and quantitative “data,
methods, methodologies, and / or paradigms in a
research study or set of related studies”; a type of
multimethodology research (Multimethodology, Jan.
25, 2022)
Multimethodology research
Use of “more than one method of data collection or
research in a research study or set of related
studies” (Multimethodology, Jan. 25, 2022)
6
7. SOME EXAMPLES MULTIMETHODOLOGY (AND MIXED METHODS)
RESEARCH
An experimental intervention (quant) and a follow-up online survey (qual) (sequential multimethod research)
A program performance audit based on documentation and data (quant) and interviews (qual) (multi-sourced
data, multimethod research: content analysis, interviews)
A simulation study (quant) combined with social data and social network analysis (qual) (multimethod and multi-
sourced data)
Medical trials of a new drug (quant) along with long-term participant health data and surveys (qual) (multimethod
research)
Longitudinal research combining laboratory-based health data (quant) and surveys (qual) (multimethod research)
Autoethnography or ethnography (qual) studied in the context of external population data (quant) (multi-sourced
data of both types)
7
8. SOME EXAMPLES MULTIMETHODOLOGY (AND MIXED METHODS)
RESEARCH (CONT.)
Scientific research in the lab (quant) combined with external focus groups (or interviews or surveys) (qual)
(multimethod and mixed method research)
A quasi-experimental learning intervention (quant / qual) with assessment of grade data (quant)
Learning management system (LMS) data at scale (quant) combined with student surveys (qual) (mixed data)
Social media data (quant) combined with e-Delphi method study (qual) (mixed data)
Student grades (quant) and student survey responses (quant / qual) (mixed data)
Online-based interviews (qual) and sensor data (quant) (multimethods, mixed data sources)
8
9. SOME EXAMPLES MULTIMETHODOLOGY (AND MIXED METHODS)
RESEARCH (CONT.)
An oral history project (qual) with computational text analysis (quant / qual) with demographic data (quant)
(mixed data)
Mapping the state of a nation’s research by bibliometrics (quant / qual) and demographic analysis (quant) and
interviews (qual) (multimethod and mixed method)
And innumerable other variations
9
10. CAQDAS: COMPUTER-ASSISTED QUALITATIVE DATA ANALYTICS
SOFTWARE
computer-assisted qualitative data analysis software (CAQDAS)
Includes a wide number of software programs, including…
NVivo
[data exploration with word frequency counts, text searches; matrix queries; qualitative cross-tab analysis; compound queries; coding
queries; coding similarity analysis; manual coding; codebook export; memo export; reports export; machine learning: topic modeling (with
human researcher in-the-loop), sentiment analysis, speaker coding from transcripts, style coding,“NV” coding based on manual codebook;
data visualizations; manual model drawing; automated model drawing, and others]
[runs on Windows, Mac, and servers] [some differing capabilities]
10
11. A LIGHT COMPARISON / CONTRAST BETWEEN QUANT AND QUAL
APPROACHES
Quantitative research
Epistemological approaches (ways of knowing, ways
of making meaning)
Assumption of objectivity and absolutism, normal curve
to represent populations
Striving for high-rigor and reproducible research
Practical and applied, problem-solving; theoretical
relevance and implications
Qualitative research
Epistemological approaches (ways of knowing, ways
of making meaning)
Assumption of subjectivity and relativism on the part of
researchers
Striving for rich data (coded to saturation)
Practical and applied, problem-solving; also theoretical
relevance and implications
11
12. A LIGHT COMPARISON / CONTRAST BETWEEN QUANT AND QUAL
APPROACHES (CONT.)
Quantitative research
Experimental research
Gold standard is experimental research
Lab-based
Field-based
High-precision measures, highly defined research
methodologies, high rigor
Qualitative research
Natural experiments, field observations,
Data elicitations through focus groups, interviews
(structured and semi-structured)
Valuing of voice
Informants based on positionality
All content has data value: content analysis, gray
literature, metadata
12
13. A LIGHT COMPARISON / CONTRAST BETWEEN QUANT AND QUAL
APPROACHES (CONT.)
Quantitative research
Reliance on statistical analysis, descriptive statistics, other
statistical methods, deductive logic
Independent variable(s), dependent variable(s)
Controls for potential other influences (noise)
Evaluate whether p-values justify rejecting null hypotheses
Use randomization for seating panels, participants, and so on
Can go with convenience samples, can go with snowball
sampling, and others, but these are weaker sampling methods,
with room for biasing
Require “power” in terms of numbers for representation
Qualitative research
Reliance on researcher expertise, thematic (and other) coding,
statistical methods
Can learn from small datasets
Can learn from an n = 1
Can learn from individual cases / case studies / groups of cases
Can make case for a construct based on coding similarity
analysis (using Cohen’s Kappa, Kappa Coefficient)
Usually a range of .6 to .8 where 1.0 is full agreement of what
is relevant and what is not relevant in the coding
Need to avoid “reification” (assuming an abstraction has
instantiation in concrete reality),“hallucinated” senses of reality
13
14. A LIGHT COMPARISON / CONTRAST BETWEEN QUANT AND QUAL
APPROACHES (CONT.)
Quantitative research
Experimental reproducibility and repeatability
Generalizability of certain standards are met
Qualitative research
Not striving for generalizability but for patterns and
insights
No assumption of being able to totally recreate a
prior qualitative study
May do follow-on studies with the “same” population
14
15. A LIGHT COMPARISON / CONTRAST BETWEEN QUANT AND QUAL
APPROACHES (CONT.)
Quantitative research
Assumption: Interchangeability of similarly trained
researchers
Integrity required
Complex skills required
Qualitative research
Assumption: A non-interchangeability of researchers
Celebration of the researchers’ unique interpretive lens
Uniqueness of researchers as a strength
Willingness to challenge status quo, cultural understandings;
be transgressive and revolutionary
Openness to novel experiences
Ability to take on challenging work
Control of own cognitive biases, perceptual slants,
preferential thinking
15
16. A LIGHT COMPARISON / CONTRAST BETWEEN QUANT AND QUAL
APPROACHES (CONT.)
Quantitative research
Work can be challenged:
Repeat of the experimental research but with new data
Finding of errors in the original handling and / or
analysis of the data
Unclear evidentiary chains
Finding of logic errors
Poor methodologies
Identification of research or other fraud
Qualitative research
Work can be challenged:
Finding of incorrect application of logic or theory
Insufficient richness of data
Researcher biographical bias
Poor methodologies
Identification of research or other fraud
16
17. SOME COMPUTATIONAL DIFFERENCES
Quantitative research
Statistically significant data patterns through…
Cross-tab analysis
Factor analysis
Principal components analysis
Cluster analysis
Network analysis, social network analysis, word networks,
related tags networks, and others
And others
Qualitative research
Focus on natural language analytics
Spoken, written, mixed
Various genres and forms
Harnessing of multimedia, gray literature, various “found”
contents, and others
Data elicitations using computational means
Data patterns through…
Topic modeling
Sentiment analysis
Predictive analysis
Qualitative cross-tab analysis
Text and data mining 17
18. SOME COMPUTATIONAL DIFFERENCES (CONT.)
Quantitative research
Machine learning
Supervised machine learning
Unsupervised machine learning
Data modeling from machine learning based on training
data (such as for predictive analytics, with automated
creation of confusion matrices and f-scores)
Artificial intelligence (AI)-based “experiential” learning
High performance computing with big data and big data
streams
Qualitative research
Can be applied at scale now
Can “remember” unique coding fists of unique
researchers and apply their coding computationally
18
19. COMPUTATIONAL INSTRUMENTATION / TOOLS
AND DIGITAL RESOURCES
Quantitative research
Software programs, code, script, macros
Curated datasets
Datasets
Data models
Connected script and datasets
Survey instruments
Interview instruments
Qualitative research
Manual codebook creation, automated codebook
creation (both coded to saturation)
Created with top-down coding (based on theory or
framework or model, or some combination; based on pre-
determined research questions; based on a priori
hypothesizing); bottom-up coding (grounded theory); both
top-down and bottom-up coding
Codebooks named and often with easy-reference acronyms
.qdc format for digital codebook sharing and heritability,
Microsoft Word or LaTeX formats for appendices
19
20. COMPUTATIONAL INSTRUMENTATION / TOOLS
AND DIGITAL RESOURCES (CONT.)
Quantitative research
Research journals
Field notes
Rubrics
Matrices
Checklists
Qualitative research
Coding dictionaries
Software programs
Curated datasets
Research journals
Field notes
Rubrics
Matrices
Memos
20
22. TYPICAL RESEARCH DATA SHARING PRACTICES
Quantitative research
Full dataset (into perpetuity, at the time of publication)
Data exploitable (in a constructive sense) for other
analyses (but need to cite the creator of the dataset)
The code used to interact with the data and to create
data visualizations
Sometimes derived or “shadow” datasets
De-identified data (privacy protections)
Canonical collections (image sets, video sets, others) for
further study
Clear evidentiary chains
Qualitative research
Project files sometimes
De-identified data (privacy protections)
Instruments may be shared, like codebooks (manual and
digital) and computer programs
Partnerships more rare than in quant research teams
22
23. (IN)CONCLUSIVENESS OF RESEARCH FINDINGS?
Quantitative research
Convergence towards a consensus
Not fully definitive for all time (may be overturned at
any point with new research)
No absolute “proof” in most cases but leaning in
certain directions
Even paradigms shift
Reproducibility of computational outcomes given the
same dataset and the same queries or autocoding
processes
Qualitative research
Never an absolute last word, but a momentary
provisional observation for a particular point-in-time
No absolute “proof” in most cases
Even paradigms shift here, too
Reproducibility of computational outcomes given the
same dataset and the same queries or autocoding
processes
23
24. HUMAN SUBJECTS RESEARCH AND STANDARDS
Professional ethics and regulations / laws that protect the following and more:
IRB (institutional review board) oversight prior, during, and post research
Non-use of duplicity except in rare approved-by-IRB cases
Research value
Legally procured data
Research subjects’ well-being
Informed consent for research subjects, ability to withdraw from the research at any time
Research subjects’ privacy
Data preservation
24
26. SELECTED QUALITATIVE RESEARCH PRECEPTS
It helps to have broad and general knowledge along with in-depth focused knowledge. All knowledge can inform
the work.
There are data everywhere. Everything is datafy-able.
Everything is culturally informed. Everything is seen through a cultural lens. It helps to be aware of culture, one’s
own and others’.
The data source may be anywhere from [raw and “found” in-world] to [refined, edited, vetted, and “worked
through”].
26
27. SELECTED QUALITATIVE RESEARCH PRECEPTS (CONT.)
With some work, data may be transcoded to information.
All human creations have potential informational value:
formal published work, gray literature (brochures), private letters, cultural artifacts, artworks, commenting on social media,
building designs, stamps, candy wrappers, private collections of anything, etc.
The informational value may differ based on research context and researcher interests. Different researchers will
extract different meanings from the same dataset.
27
28. SELECTED QUALITATIVE RESEARCH PRECEPTS (CONT.)
All researchers are subjective. They have built-in biases.They need to be self-aware and control for their own
biases in order to conduct effective research.
In their work, they need to report on their biases and how they mitigate their biases.
Different researchers approaching a particular topic will likely take different approaches and emerge with different findings
(to a degree).
Researchers have their own “coding fists” or “coding hands.” They identify relevant data differently.They create
different coding categories, and these categories may be mutually exclusive or not. They may engage greedy or
frugal coding (whether a coded object can be coded more than once and in different categories).
Computation enables researcher “coding fists” to be preserved and re-used into the future.
28
29. SELECTED QUALITATIVE RESEARCH PRECEPTS (CONT.)
The research findings are not about generalizing to a population per se but about surfacing relevant insights.
Researchers strive to see differently. One of their “superpowers” is in re-interpreting, at various levels: micro
(ego), meso (group, entity), and macro (larger systems).
Researchers work across cultures and contexts. They are able to disengage from the context in order to view the situation
analytically.
Values (stated and implied) are an integral part of the research.
Qualitative researchers do not assume that the status quo is all as it should be. In qualitative research, advocating
for social change and equity and justice is considered a professional responsibility.
Studies may be disciplinary or interdisciplinary.
29
30. SELECTED QUALITATIVE RESEARCH PRECEPTS (CONT.)
CAQDAS tools support the human researcher.
The human researcher is foremost in the research and is not displaced by the
technology. However, the human is changed by using technologies, too.
One graduate student wanted to use autocoding alone for her master’s thesis,
without bringing her own expertise to bear. Not a good idea… Unless you can
create the code for the data analytics informed by your knowledge, a generalized
software tool will output generalized insights.
CAQDAS enables scalability of various types of computational analytics. For
example, a human-created manual codebook in NVivo can be applied to a
larger dataset and coded with a Cohen’s Kappa coefficient of 1.0. (albeit in a
machine sensibility, not a human one)
30
31. TEMPORAL LEGACIES OF QUALITATIVE RESEARCHERS
A lifetime body of work
Particular research works
Unique or powerful contributions to particular
insights, theories, practices, and others
Coining of new terms
Originating new research methods, how research is
operationalized
Research instruments
Ability to reach the lesser-reached
Language skills
Professional and other affiliations
Professional collaborations
Personality, persona, charisma
Promotion of social change, advocacy for certain
values
Effective funding and uses of available resources
Style(s) and aesthetics
And others…
31
34. WHAT IS “CODING” ANYWAY?
Manual coding
Reading collected data (transcripts, articles, maps,
audio, video, photos, etc.) and identifying elements of
interest and coding them to a codebook in natural
language
Organizing the codebook in a rational order, with
child nodes, grandchild nodes, great grandchild
nodes, etc. (structured codebook)
Automated coding
Distant reading by machine using the following:
Word counts
Algorithmic topic extraction
Application of sentiment dictionaries to text at varying
levels of granularity (sentences, paragraphs, or data
cells…depending on the formatting of the textual data)
34
35. WHAT IS A CODEBOOK ANYWAY?
A basic codebook contains the following: coding nodes (classifications of codes) and descriptions for each node
so that coders understand what information belongs in that classification
A codebook may be hierarchical, with top-level nodes, child nodes, grandchild nodes, and so forth
The nodes may be sectioned based on topics. They may be sectioned in alphabetical order.They may be ordered
with leading 0s. There are many accepted ways for the ordering of the codes.
In table format, a codebook looks like the following (in the simplest construct):
35
Codes Descriptions of Coding Categories
36. WHY SHOULD A CODEBOOK HAVE A NAME?
A codebook should have a name that describes what is coded by that codebook. The foci and discipline should
be identifiable.
A codebook name should have a clear acronym, for easy reference.
A codebook needs a name so that it is easily citable by other researchers.
A codebook needs a name so that researchers can credit the original codebook instrument creator when they
use the codebook…or when they create a module to add to it, etc.
A codebook should have a name because of how it is used in the research and academic space.
A codebook shows a culmination of expertise…and expert interactions with a sufficient amount of relevant data.
36
37. THINK IN SEQUENCING
What are data patterns in a particular set of core data files? (word frequency counts, text searches, topic
modeling, and others)
What are proxemic terms around particular names, dates, labeled phenomena, symbols, and others? (proximity
searches)
Who are the different individuals who responded to the survey / focus group / interviews based on demographic
data? Based on topics of interest? Based on general sentiment? (classification sheets w/ demographic
information and case nodes, topic modeling, sentiment analysis, and others)
What are features of the created manual coding? Automated coding? (matrix coding queries)
In a sentiment analysis of a social network’s discussion, what topics are seen in the most positive sentiment?
Which topics are seen in the most negative sentiment? (sentiment analysis, topic modeling)
37
38. THINK IN SEQUENCING (CONT.)
In conducting a review of the literature, a large number of files have been downloaded from various subscription
and open-source web-facing databases. The research is focused on a particular subset of articles. The researcher
does not want to read all the articles. How can the researcher hone in on the particular works of interest?
(topic modeling by article set; topic modeling by titles and abstracts; word searches in the database of articles)
38
39. THINK IN SEQUENCING (CONT.)
In a geographical analysis of responses, what are topical and sentiment patterns and attitudes? (classification
sheets, geographical modeling, topic modeling, sentiment analysis)
In creating a team’s consensus codebook, based on collected .nvp and .nvpx project files (or even server files),
how do the various human-generated manual codebooks differ? What are the outlier ideas? (event logging for
objective record of individual researcher / coders and contributions, matrix coding query, coding comparison for
Kappa coefficient, transcoding of project files from-toWindows, Mac, server)
In the “use existing coding patterns” in which “NV” (the software) codes by emulating the human-generated
codebook, various individual’s and teams’ coding fists are emulated computationally…to scale…to computational
speeds. This enables preservation of people’s points-of-view and coding patterns. What are ways to ensure that a
codebook is coded to saturation, since this feature does not add any new nodes (coding classifications)? (use
existing coding patterns)
39
40. THINK IN MULTIPLE NVIVO FILES
When working on a large-size or longitudinal project (including doctorate degrees), use a number of files to
achieve your aims.
Make a file for the review of the literature. Make a file for the focus group. Make a file for the fieldwork. Make a file for
the social video analysis. Make a file for the analysis of the geographical maps.
Combine data only when you need to run data queries and / or autocoding on the particular set of information.
Do not clump everything into a large file unless your queries require access to all the included data.
Always have a backup set of files in the cloud or in multiple physical locations (so as not to lose work
accidentally).
40
41. THINK IN MULTIPLE LANGUAGES
NVivo enables coding in a number of languages:
simplified Chinese
English (US)
English (UK)
French
German
Japanese
Portuguese
Spanish
UTF-8 and UTF-16 enables representations of all languages on the Web and Internet.
41
42. THINK IN TEAMING
Aim for wide dissensus when originating a team codebook, so that the widest variety of ideas may be captured
initially before there is convergence to a consensus codebook
Aim for narrower consensus when training a team to use a defined codebook on defined data for a sufficiently
high Cohen’s Kappa / Kappa coefficient to establish the validity of a construct
42
44. ABOUT NVIVO
NVivo is a qualitative data analytics software tool that acts like a database (that enables the storage of structured
and unstructured data, the running of queries, the interaction with data, the drawing of data visualizations, the
export of reports, and so on)
The prior version of the software was known as NUD*IST (1981 – 1997), and N4 to NVivo from 1997 to
present (“NVivo,” Oct. 14, 2021)
NUD*IST stood for “Non numerical Unstructured Data Indexing Searching and Theorizing software”
44
45. BASIC SELECTED ANALYTICAL CAPABILITIES OF NVIVO INCLUDE…
Exploration of data
Word frequency count
Text search with various parameters
Similarity cluster analysis
Coding analysis
Matrix coding
Qualitative crosstab analysis (with case nodes and
classification sheet data)
Coding comparison (with Cohen’s Kappa / Kappa
coefficient)
Compound queries
Group queries
Locational geographical mapping from social media
data
Ego neighborhood mapping for following network in
directed graphs
Various data tables
Various data visualizations (dendrograms, treemap
diagrams, word trees, ring lattices, cluster diagrams
2d, cluster diagrams 3d, and others)
45
46. BASIC SELECTED ANALYTICAL CAPABILITIES OF NVIVO
INCLUDE…(CONT.)
Autocoding from data (various forms of machine
learning)
Topic modeling (“distant reading” of texts and
extraction of topics)
Sentiment analysis
Coding by style
Coding by name in transcript
Use of existing coding patterns (machine copies human
manual codebook to dataset scale and computation
speed)
Autocoding from survey downloads (to case nodes
and topic modeling and sentiment analysis)
46
47. SOME SCREENSHOTS FROM AN ORIGINAL DEMO
PROJECT
FROM NVIVO 12 (ONEVERSION PRIOR TO LATEST) AND NVIVO (LATESTVERSION) ON WINDOWS
47
59. SOME ANALYTICAL APPLICATIONS OF NVIVO IN THE RESEARCH
LITERATURE
Manual coding of various research data and the extraction of manual codebooks (based on a variety of target
topics)
Reproducible / repeatable autocoded topic modeling to compare against human coding
Autocoded sentiment analysis (positive or negative sentiment) of text sets
Respondent profiling by topics of focus and sentiment
Codebook analysis (analysis of the code, whether manual or autocoded or combined)
Qualitative cross-tab analysis for data patterns of respondents by various attributes (demographic and others)
Social media data extractions (tweetsets from a microblogging site, poststreams from a social networking site,
social video from a social video sharing site with comments, and so on)
59
61. STRUCTUREDVS. UNSTRUCTURED / SEMI-STRUCTURED DATA
Structured data
Labeled data in data tables
Each value in a cell is labeled by the column header and
the row header
Each value in a cell is identified by type of data with
attendant features
Unstructured, semi-structured data
Text
Imagery
Audio
Video
Multimodal, multimedia-based
* The argument for “semi-structured” vs.“unstructured”
is that there is no absolutely unstructured data unless
it’s randomness (even pseudo-randomness is not fully
unstructured). Natural language has an inherent
structure. Ditto storytelling, audio, video, and so on.
61
62. TYPES OF USABLE DATA IN NVIVO
Text files (incl. pdf)
Image files (maps, screenshots, photos, diagrams, and
others)
Audio files
Video files
Survey data (Qualtrics, Survey Monkey)
Web bibliography sources
Online notetaking sites
Email message and identity data
Excel workbooks
SPSS datasets (note the tie to quant methods from a
qual analytics tool…and vice versa)
NVivo projects (for team collaborations)
.qdc codebooks, .docx codebooks
(code category names and descriptions for what goes into
each category, not exemplars within the categories)
NVivo memos, NVivo reports, and others
* For multimedia, there have to be text equivalencies for
the imagery, audio, and video (transcripts)
62
63. USING CLEAN DATA
To have clean data, select the files purposefully.
Take out personally identifiable information (PII).
Ensure that metadata does not carry sensitive information (in the imagery, in the text files, in the video files, etc.)
Do not digitally annotate the files before you ingest those into the NVivo project, or you’ll have introduced noise
into your data. (If you want to annotate files, do so, but keep those files separate from the pristine ones that will
be ingested into the .nvp or .nvpx files.)
63
64. ENSURING USABLE DATATABLES AND DATAVISUALIZATIONS
OUTSIDE OF NVIVO
It helps to export data tables and data visualizations from NVivo unless you will have a forever license and assume that
the software will be forever available. NVivo is a proprietary software, and you will need a version of the software to
open NVivo files.
An older version of NVivo cannot open newer .nvp or .nvpx files. Upgrading files will mean that the upgraded version of the
software is needed.
Record the exported contents clearly with consistent file-naming protocols. Record the parameters used to extract the data table
or data visualization. [Review the data. Do “sanity checks” of the data before exporting and saving.]
Just make sure that you have clear naming protocols for any exported data, so you know what you’re looking at when
you access the files later. Document the parameters you use to run various data queries and machine learning
sequences, so you can represent them clearly in a publication or presentation. [The data analytics process is sequence-
sensitive. The order of operations affects data at each step and the ultimate outcomes. Error introduced at any one
point potentially amplifies.]
Any raw data you ingest into an NVivo project should also be stored externally to the project as well, so they are
available for reference external to the project file.
64
65. DATA EXTRACTIONS FROM SOCIAL MEDIA
USING NCAPTURE (WEB BROWSER ADD-ON TO GOOGLE CHROME)
65
66. SOCIAL MEDIA DATASETS
Profiling social groups and sub-groups
Capturing a sense of mass mood / sentiment around particular topics
Identifying the most high-degree social nodes in a social network; mapping the social network to understand
dynamics
Mapping http networks from social media
Analyzing social images on social media (for content, for sentiment, for identified peoples, and others)
Identifying synthetic persons (‘bots)
Identifying general geographical locations of respective linked social accounts
66
67. ABOUT SOCIAL MEDIA DATA AND NCAPTURE
NCapture works on Google Chrome (and the aging-out unsupported Internet Explorer / IE web browser)
Various social media platforms are not supported in IE now, so using NCapture does not enable access to the
various platforms, like Facebook / Meta
Developers are working on a bridge to Facebook via Chrome, but that has not been available for many months
Access to social media data is rarely an n = 1 (without paying for the data from the social media platform
provider or a third-party source)
Given dynamism in the space (due to various dependencies and other factors), if you have a chance to collect the
social data, do so. Do not assume that the chance will always be there.
Take a screenshot of the landing page of the social account you’ve profiled, so you have the “state” of the account
at the time of the data capture. (You may have to go deeper for more than summary data.)
You can scrape images using third-party web browser add-ons to capture that data in thumbnail format.
67
68. LIMITS TO COMPUTATIONALTEXT ANALYSIS IN NVIVO (IMHO)
Some limits:
may view words as individual n-grams and not bigrams, three-grams, four-grams, etc.; does not capture phrases
may have an insufficient stopwords list
does not understand negatives
does not understand humor
does not understand irony
does not understand external referents to a text or text corpus
machine logic and not human logic in “use existing coding patterns” machine emulation of human coding
is limited by human manual coding when using “use existing coding patterns”
does not capture an n = all in social media accounts with NCapture (unless the accounts have a limited amount ofTweet or
poststream contents) … given API (application programming interface) limits on the various social media platforms
There are commercial ways to acquire n = all datasets but require queries run online on “big data” datasets (using different data query methods like
versions of structured query language)
68
70. INDIVIDUALS AS RESEARCH INSTRUMENTS…
An individual researcher is a “research instrument”.
A group of researchers is a collaborative “research instrument”.
Researchers are sentients, and their aperture and vision and methods inform their power and capability.
Their social connections are part of their power and capability.
Their positionality—which they can change—affects what they have access to and what they can achieve.
70
71. CYBORGS HAVE MORE SKILLS TO DEPLOY
The building up of new knowledge and new skills in the computational space can enable an extension of the
researcher capabilities.
A “cyborg” is a “bionic” personage, who is both flesh and machine (as technical enhancement).
CAQDAS and other data analytics software have a forcing function: they force a researcher to get more precise
and to explicate and to explore and to ultimately form a sense of the target research topic.
Technologies change the researcher. [Some see this as a strength. Others see this as a threat. People choose how to
wield and apply certain tools.]
71
72. THE “BIONICS” FORTHE RESEARCHER INCLUDE THE
FOLLOWING…
Technologies enable the capture of otherwise-inaccessible data, in vitro, in vivo, and in cyber. They extend collection.
Technologies enable various discovery and explorations of the available data.Technologies enable rich review of
data. They extend perception.
Technologies enable permanent archival of data. They extend memory.
Technologies enable expanding askable research questions…and the testing of various hypotheses.They extend
asking and hypothesizing. They extend thinking.
Technologies can enable complementary insights to those attained by manual methods alone. They extend
learning. They extend conceptualization.
72
73. “BLACK BOX” ELEMENTS IN RESEARCH AND DATA ANALYSIS
There are some “black box” elements to both human and the
computational machines and methods.
For as much effort that goes into transparency in research and
computational sequences, there are “inexplicables” (as in ANNs
and how neural networks process data—although computer
scientists are getting closer to some explanations; as in human
intuitions and cognitive leaps; as in workings of the human
subconscious and unconscious).
Then again, not everything has to be fully understood and
explicated.
73
75. GETTING STARTEDWITH CAQDAS
CAQDAS is Computer-Assisted Qualitative Data Analytics Software.
Study how computation is applied to various qualitative data analytics challenges…and what may be asserted from
various analytics.
Explore the available software tools and their respective capabilities.
Decide which software programs provide the capabilities you will use.
Decide which software tools have a comfortable user interface.
There are some free software tools available, if you’re comfortable with command line (vs. graphical user interface).
Go for it! Start slow. Be nice to yourself. Be nice to others. Build the skillset. Share your knowledge, skills, and
abilities (KSAs).
75
76. CONTACT
Dr. Shalin Hai-Jew
ITS
Kansas State University
shalin@ksu.edu
785-532-5262
Gentle caveat: This presentation uses one software tool to bridge to CAQDAS. There are many other tools and
methods and capabilities…
Resource: Using NVivo: An Unofficial and Unauthorized Primer
76
77. EXTRA: AND A DIFFERENT FAVORITE CAQDAS TOOL: LIWC-22
LIWC-22*
[validated instrument trained on a number of natural language datasets; measures psychometrics; linguistic features; sentiment analysis,
four category scale scores related to (1) analytical thinking, (2) clout, (3) authenticity / warmth, (4) emotional tone (sentiment); and other
foci; enables custom dictionaries focused on various objectives; has versions in a number of different languages]
[related to a variety of insightful research]
[runs on Windows]
77
78. EXTRA: SOME ANALYTICAL APPLICATIONS OF LIWC INTHE
RESEARCH LITERATURE
Author identification (historical and present), (in)validation of authorship
Predictive analytics
Fraud detection (people and “self-invisible” tells)
Suicide intervention
Remote personality (ego and entity) profiling (longitudinal, episodic)
Political leader profiling
Political group profiling
Ideology profiling
Human landscape mapping, social network mapping
Elicitation of power dynamics
Terrorist group mapping
78