Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Using Clustering as a Tool: Mixed Methods in Qualitative Data Analysis
1. Using Clustering as a Tool:
Mixed Methods in Qualitative Data Analysis
Laura Macia, PhD
Behavioral and Community Health
Sciences
Graduate School of Public Health
University of Pittsburgh
5. Cluster Analysis
• Method for grouping data by their similarity
– Appropriate data
– Defining similarity
– Clustering
6. Data Preparation
• Types of data:
– Nominal
– Ordinal
– Interval / Ratio
Qualitative Data
(an example)
Latino Grievances Project
7. Summary Table: Nodes and Attributes (after thematic analysis using Nvivo)
Select Variables Values [description]
Part 1: Gender
Strata
Legal status
Income
Education
(0) Male; (1) Female
(0) Blue-collar; (1) Spouse of American citizen; (2) White-collar
(0) US citizen; (1) Legal permanent resident; (2) Immigrant visa; (3) Non-immigrant visa;
(4) Visa overstay; (5) Undocumented
(0) Under $20k; (1) $20k to $40k; (2) $40k to $60k; (1) $60k to $80k; (1) $80k to $100k;
(5) Over $100k
(0) Primary; (1) Some secondary; (2) High-school diploma; (3) College degree; (4) Graduate
degree; (5) Other degree
Part 2: Type
Nationality
(0) Male; (1) Female; (2) Individual [when gender unknown]; (3) Institution; (4)
Government; (5) Other
(0) American; (1) Latino; (2) Other; (3) Unknown
Grievance (0) Debt; (1) Discrimination; (2) Domestic; (3) With the law
Procedural
mode
(1) None
(2) Adjudication [third party with authority to intervene, i.e. courts]
(3) Arbitration [third party agreed to by principals]
(4) Mediation [third party aiding principals reach an agreement]
(5) Negotiation [two principals decide on settlement]
(6) Coercion [imposition of outcome by unilateral threat or use of force]
(7) Avoidance [terminate relationship / withdraw from situation]
(8) Lumping it [“letting go” as of grievance]
(9) Assumed fault* [structure grievance as occurring due to own situation/fault]
(10)Talk back* [letting know of grievance without expecting further action]
(11)Other
* Data-driven codes, not included in predefined coding scheme
8. Data Preparation
• Types of data:
– Nominal
– Ordinal
– Interval / Ratio
Qualitative Data
Gender: (0) Male, (1) Female, …
Type of Grievance:
(0) Debt, (1) Discrimination, …
Chosen Procedure:
(2) Adjudication, …(6) Coercion, …
Income: (0) <$20k, (1) $20k-$40k, …
Education:
(0) primary , … (2) high school diploma, …
9. Units of analysis: Cases
ID Strata Part2 Part2Natlity Type ProcMode1ProcMode2ProcMode3Support1 Support2
1 WC Individual Unknown Debt Other None None None None
2 WC Institution American Debt NegotiationAvoidanceNone None None
3 WC Female American DiscriminationAssumed faultLumping itTalk back None None
4 WC Individual American DiscriminationOther None None None None
5 WC Male Latino Domestic Other NegotiationNone Other None
6 WC Female Latino Domestic NegotiationOther None Family None
7 WC Male Latino Domestic NegotiationNone None Family None
8 WC Government American Law NegotiationAssumed faultOther Family None
9 WC Male Latino Debt NegotiationLumping itOther Family Friend
10 WC Female Other Debt Talk back AvoidanceOther Family None
11 WC Institution American Debt AvoidanceOther None Friend None
12 WC Institution American Debt Other None None None None
13 WC Male Unknown Debt Assumed faultNegotiationNone Friend None
14 WC Male American DiscriminationLumping itNone None None None
15 WC Institution American DiscriminationOther None None Church None
16 WC Male Other DiscriminationLumping itOther None Family None
17 WC Other Latino Domestic NegotiationNone None None None
18 WC Female Other Domestic NegotiationNone None Other None
19 WC Female Other Domestic NegotiationOther None None None
20 WC Government American Law Assumed faultNone None None None
12 variables
11. Clustering decisions: variables
• Variables to include
– All relevant variables
what is your question?
• Variables to exclude
– irrelevant variables that bias towards certain
cluster solutions
12. Clustering decisions: similarity
• For binary data: Contingency Tables
• Pay attention to the a, b, c and ds in your data:
– Which are more common?
– More meaningful?
14. Clustering decisions: linkage
• Classification strategy
– Hierarchical clustering
• Good for “smaller” sizes (in the hundreds)
• Allows choosing from many similarity measures
• Randomize order, repeat, compare
agglomerative
divisive
15. Clustering decisions: method
• Linkage method:
• NOT: centroid, median, or Ward
• Between-groups linkage:
d = smallest resulting avg cross-linkage distance
• Within-groups:
d = smallest resulting avg within linkage distance
• Nearest neighbor(single linkage):
d = smallest between two points
• Furthest neighbor (complete linkage):
d = largest between two points