SlideShare a Scribd company logo
1 of 25
February 8, 2022
1. Why clustering?
2. Example 1, hierarchical clustering
3. Questions
4. Overview of common algorithms and strengths/limitations
5. Example 2, K-means clustering
6. Questions
2
3
4
Reverse looking
What are characteristics of
students who used/did not use
a service? Did/did not persist?
Who is the prototypical
student in each academic
program?
Forward looking
What characteristics comprise
our prospective student
personas?
How do risk factors overlap in
some groups of students?
5
The challenge
Contrary to national trends, Dallas College has experienced larger
than typical drops in female student re-enrollment patterns. Our
goal was to stem decreases for the Fall 2021 semester through a
text campaign.
The question
What messaging should be used to resonate with these students?
6
45k students
Diverse features for the model
• Demographics: race/ethnicity, gender, age
• Financial: income, employment intensity
• Household: household size, dependent count
• Academics: last term of enrollment, credits, GPA
• Special Pop membership
7
8
Step 1
Starts with a case as leaf node
Each case goes into that leaf
node or breaks into a new
node
Result is a Cluster Features
tree
Step 2
Leaf nodes are combined
through agglomeration
Result is several “best”
clusters with a silhouette score
9
10
African Am.
3 dependents
Part-time job
Hispanic
4 dependents
Part-time job
Hispanic
2 dependents
Full-time job
White
1 dependent
Full-time job
11
BIRCH and Two-step (SPSS proprietary; Ex 1)
K-means, K-modes, K-medoids (Ex 2)
Hierarchical (Agglomerative and Divisive)
12
13
14
15
Clusters based on
distance cutoff line
Distance cutoff line
Algorithm Variable Types
BIRCH / 2-Step Any, multiple at the same time
Hierarchical Any, but stick to one kind at a time and use
well-paired distance measure
K-means Continuous
K-modes Categorical
K-medoids Any, multiple at the same time with the
right distance measure (Gower)
16
17
Algorithm +/-
BIRCH / 2-Step Flexible variable types, fast compute, large
data; there is an element of a black box
Hierarchical Flexible modeling, highly explainable;
limited to small data
K-Means, K-Modes Easy, fast, large data; sensitive to K, curse
of dimensionality, clusters same sized
K-medoids Like K-Means but more flexible variable
types and more costly tuning & compute
18
• Population: 24 years and older adults enrolled for Fall 2021
• Features : Academics, Demographic, Financial , Household and Veteran status.
• Mix of discrete and continuous variables with majority of them being categorical data.
19
Below Federal poverty
Level
Below
Median
Lower half Above
Median
Upper half Above
Median
Median household income in Texas
Poverty Flag INCOME BIN
20
V1 V2 V3 V4 V5
0 1 1 0 1
V1 V2 V3 V4 V5
1 0 0 0 1
1+1+1+1 = 4
Dissimilarity measure
0: Mismatch 1: Match
• Randomly select the K initial centers
• Repeat
1. Assign the samples to nearest center
2. Update means/modes based on newly formed cluster
3. Calculate the cost (SSE/Sum of dissimilarity)
• Stop when cluster centers converges
Using frequency-based
method to calculate mode
instead of the mean of the
sample
Minimize the cost function
Sample 1
Sample 2
21
Student Personas
22
Missing values for categorical variables
such as employment status
Missing values for
continuous variables
such as income
Possible solutions
• Revisit our data warehouse to obtain as much
information as we can to fill in the missing values
• Imputation with K-Nearest neighbors with
Hamming distance for categorical data
• Imputation with K-Nearest neighbors with
Euclidean distance for continuous data
Jeremy Anderson, Associate Vice Chancellor of Strategic
Analytics, jeremy.anderson@dcccd.edu
Dillon Lu, Data Analyst, DLu@dcccd.edu

More Related Content

Similar to Clustering Models to Assist in Student Outreach

Proposal defense apr19
Proposal defense apr19Proposal defense apr19
Proposal defense apr19rgeurtz
 
Using Data to Mobilize Commuities and Change Lives
Using Data to Mobilize Commuities and Change LivesUsing Data to Mobilize Commuities and Change Lives
Using Data to Mobilize Commuities and Change LivesRaisingTheBar2015
 
Margaret Patton, Dissertation Defense, Dr. William Allan Kritsonis, PhD Disse...
Margaret Patton, Dissertation Defense, Dr. William Allan Kritsonis, PhD Disse...Margaret Patton, Dissertation Defense, Dr. William Allan Kritsonis, PhD Disse...
Margaret Patton, Dissertation Defense, Dr. William Allan Kritsonis, PhD Disse...William Kritsonis
 
Dr. Margaret Curette Patton, PhD Dissertation, Dr. William Allan Kritsonis, D...
Dr. Margaret Curette Patton, PhD Dissertation, Dr. William Allan Kritsonis, D...Dr. Margaret Curette Patton, PhD Dissertation, Dr. William Allan Kritsonis, D...
Dr. Margaret Curette Patton, PhD Dissertation, Dr. William Allan Kritsonis, D...William Kritsonis
 
Transitioning to Common Core: What it means, What to look for
Transitioning to Common Core: What it means, What to look forTransitioning to Common Core: What it means, What to look for
Transitioning to Common Core: What it means, What to look forCurriculum Associates
 
Local Critical Issue Version 2
Local Critical Issue Version 2Local Critical Issue Version 2
Local Critical Issue Version 2Tim Hasse
 
Getting to the Root Causes of Disproportionate Representation in Special Educ...
Getting to the Root Causes of Disproportionate Representation in Special Educ...Getting to the Root Causes of Disproportionate Representation in Special Educ...
Getting to the Root Causes of Disproportionate Representation in Special Educ...SPPTAP
 
Connecticut Core 2014
Connecticut Core 2014Connecticut Core 2014
Connecticut Core 2014EdAdvance
 
2.why the common_core_presentation_with_facilitators_notes_update_072213
2.why the common_core_presentation_with_facilitators_notes_update_0722132.why the common_core_presentation_with_facilitators_notes_update_072213
2.why the common_core_presentation_with_facilitators_notes_update_072213WRHSlibrary
 
New Canaan BOE
New Canaan BOENew Canaan BOE
New Canaan BOEEdAdvance
 
SACAC Session E.12 Sealing the Deal
SACAC Session E.12 Sealing the DealSACAC Session E.12 Sealing the Deal
SACAC Session E.12 Sealing the DealRaise.me
 
Localcriticalissueversion2 091109081645 Phpapp01
Localcriticalissueversion2 091109081645 Phpapp01Localcriticalissueversion2 091109081645 Phpapp01
Localcriticalissueversion2 091109081645 Phpapp01Tim Hasse
 
Chautauqua County School Board Dinner
Chautauqua County School Board DinnerChautauqua County School Board Dinner
Chautauqua County School Board DinnerJohn Sipple
 
Funding Dries Up For Non Profit And Educational Institutions Serving Black Co...
Funding Dries Up For Non Profit And Educational Institutions Serving Black Co...Funding Dries Up For Non Profit And Educational Institutions Serving Black Co...
Funding Dries Up For Non Profit And Educational Institutions Serving Black Co...Larry Cochran, MBA
 
Missouri ACT Identified Keys to Enrollment Success
Missouri ACT Identified Keys to Enrollment SuccessMissouri ACT Identified Keys to Enrollment Success
Missouri ACT Identified Keys to Enrollment SuccessStephaneGeyer
 
Top 11 Metrics Every Financial Aid Director Should Be Measuring
Top 11 Metrics Every Financial Aid Director Should Be MeasuringTop 11 Metrics Every Financial Aid Director Should Be Measuring
Top 11 Metrics Every Financial Aid Director Should Be MeasuringCampusLogic
 

Similar to Clustering Models to Assist in Student Outreach (20)

Proposal defense apr19
Proposal defense apr19Proposal defense apr19
Proposal defense apr19
 
Using Data to Mobilize Commuities and Change Lives
Using Data to Mobilize Commuities and Change LivesUsing Data to Mobilize Commuities and Change Lives
Using Data to Mobilize Commuities and Change Lives
 
CB FORUM FINAL
CB FORUM FINALCB FORUM FINAL
CB FORUM FINAL
 
Margaret Patton, Dissertation Defense, Dr. William Allan Kritsonis, PhD Disse...
Margaret Patton, Dissertation Defense, Dr. William Allan Kritsonis, PhD Disse...Margaret Patton, Dissertation Defense, Dr. William Allan Kritsonis, PhD Disse...
Margaret Patton, Dissertation Defense, Dr. William Allan Kritsonis, PhD Disse...
 
Dr. Margaret Curette Patton, PhD Dissertation, Dr. William Allan Kritsonis, D...
Dr. Margaret Curette Patton, PhD Dissertation, Dr. William Allan Kritsonis, D...Dr. Margaret Curette Patton, PhD Dissertation, Dr. William Allan Kritsonis, D...
Dr. Margaret Curette Patton, PhD Dissertation, Dr. William Allan Kritsonis, D...
 
Transitioning to Common Core: What it means, What to look for
Transitioning to Common Core: What it means, What to look forTransitioning to Common Core: What it means, What to look for
Transitioning to Common Core: What it means, What to look for
 
Local Critical Issue Version 2
Local Critical Issue Version 2Local Critical Issue Version 2
Local Critical Issue Version 2
 
Union Col lecture
Union Col lectureUnion Col lecture
Union Col lecture
 
Rabbani - Education
Rabbani - EducationRabbani - Education
Rabbani - Education
 
Getting to the Root Causes of Disproportionate Representation in Special Educ...
Getting to the Root Causes of Disproportionate Representation in Special Educ...Getting to the Root Causes of Disproportionate Representation in Special Educ...
Getting to the Root Causes of Disproportionate Representation in Special Educ...
 
Connecticut Core 2014
Connecticut Core 2014Connecticut Core 2014
Connecticut Core 2014
 
2.why the common_core_presentation_with_facilitators_notes_update_072213
2.why the common_core_presentation_with_facilitators_notes_update_0722132.why the common_core_presentation_with_facilitators_notes_update_072213
2.why the common_core_presentation_with_facilitators_notes_update_072213
 
New Canaan BOE
New Canaan BOENew Canaan BOE
New Canaan BOE
 
SACAC Session E.12 Sealing the Deal
SACAC Session E.12 Sealing the DealSACAC Session E.12 Sealing the Deal
SACAC Session E.12 Sealing the Deal
 
Localcriticalissueversion2 091109081645 Phpapp01
Localcriticalissueversion2 091109081645 Phpapp01Localcriticalissueversion2 091109081645 Phpapp01
Localcriticalissueversion2 091109081645 Phpapp01
 
Chautauqua County School Board Dinner
Chautauqua County School Board DinnerChautauqua County School Board Dinner
Chautauqua County School Board Dinner
 
Funding Dries Up For Non Profit And Educational Institutions Serving Black Co...
Funding Dries Up For Non Profit And Educational Institutions Serving Black Co...Funding Dries Up For Non Profit And Educational Institutions Serving Black Co...
Funding Dries Up For Non Profit And Educational Institutions Serving Black Co...
 
Credit risk predictive analytics
Credit risk predictive analytics Credit risk predictive analytics
Credit risk predictive analytics
 
Missouri ACT Identified Keys to Enrollment Success
Missouri ACT Identified Keys to Enrollment SuccessMissouri ACT Identified Keys to Enrollment Success
Missouri ACT Identified Keys to Enrollment Success
 
Top 11 Metrics Every Financial Aid Director Should Be Measuring
Top 11 Metrics Every Financial Aid Director Should Be MeasuringTop 11 Metrics Every Financial Aid Director Should Be Measuring
Top 11 Metrics Every Financial Aid Director Should Be Measuring
 

More from Jeremy Anderson

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Creating a Print-on-Demand Initiative for Open Educational Resources
Creating a Print-on-Demand Initiative for Open Educational ResourcesCreating a Print-on-Demand Initiative for Open Educational Resources
Creating a Print-on-Demand Initiative for Open Educational ResourcesJeremy Anderson
 
How any institution can get started on learning analytics
How any institution can get started on learning analyticsHow any institution can get started on learning analytics
How any institution can get started on learning analyticsJeremy Anderson
 
Creating Wraparound Supports for Students through Internal Partnerships
Creating Wraparound Supports for Students through Internal PartnershipsCreating Wraparound Supports for Students through Internal Partnerships
Creating Wraparound Supports for Students through Internal PartnershipsJeremy Anderson
 
Four LMS Tools to Change Your Life
Four LMS Tools to Change Your LifeFour LMS Tools to Change Your Life
Four LMS Tools to Change Your LifeJeremy Anderson
 
Addressing the Adjunct Underclass: Fit and Employment Outcomes in Part-Time F...
Addressing the Adjunct Underclass: Fit and Employment Outcomes in Part-Time F...Addressing the Adjunct Underclass: Fit and Employment Outcomes in Part-Time F...
Addressing the Adjunct Underclass: Fit and Employment Outcomes in Part-Time F...Jeremy Anderson
 
Plan an OER Stragegic Plan: Change Models and Processes
Plan an OER Stragegic Plan: Change Models and ProcessesPlan an OER Stragegic Plan: Change Models and Processes
Plan an OER Stragegic Plan: Change Models and ProcessesJeremy Anderson
 
Sustaining Accessible OER at Scale
Sustaining Accessible OER at ScaleSustaining Accessible OER at Scale
Sustaining Accessible OER at ScaleJeremy Anderson
 
Hybrid Course Design - Will It Blend?!
Hybrid Course Design - Will It Blend?!Hybrid Course Design - Will It Blend?!
Hybrid Course Design - Will It Blend?!Jeremy Anderson
 
The Triple A (AAA) of OER: Accessibility, Availability, and Affordability
The Triple A (AAA) of OER: Accessibility, Availability, and AffordabilityThe Triple A (AAA) of OER: Accessibility, Availability, and Affordability
The Triple A (AAA) of OER: Accessibility, Availability, and AffordabilityJeremy Anderson
 
Case Study: Increasing Access through OER Adoption
Case Study: Increasing Access through OER AdoptionCase Study: Increasing Access through OER Adoption
Case Study: Increasing Access through OER AdoptionJeremy Anderson
 
What data can deliver: A new way of operating
What data can deliver: A new way of operatingWhat data can deliver: A new way of operating
What data can deliver: A new way of operatingJeremy Anderson
 
2018 Horizon Report Webinar: Adaptive Learning and OER at Scale
2018 Horizon Report Webinar: Adaptive Learning and OER at Scale2018 Horizon Report Webinar: Adaptive Learning and OER at Scale
2018 Horizon Report Webinar: Adaptive Learning and OER at ScaleJeremy Anderson
 
The Nitty Gritty of OER Adoption
The Nitty Gritty of OER AdoptionThe Nitty Gritty of OER Adoption
The Nitty Gritty of OER AdoptionJeremy Anderson
 
The Path to Creating an Integrated Online Contingent Faculty Competency System
The Path to Creating an Integrated Online Contingent Faculty Competency SystemThe Path to Creating an Integrated Online Contingent Faculty Competency System
The Path to Creating an Integrated Online Contingent Faculty Competency SystemJeremy Anderson
 
Strategies to Scale Adaptive Course Design
Strategies to Scale Adaptive Course DesignStrategies to Scale Adaptive Course Design
Strategies to Scale Adaptive Course DesignJeremy Anderson
 
OER in the Time of Adaptive
OER in the Time of AdaptiveOER in the Time of Adaptive
OER in the Time of AdaptiveJeremy Anderson
 
Implementing Adaptive, Data-Driven Course Design to Improve Student Learning
Implementing Adaptive, Data-Driven Course Design to Improve Student LearningImplementing Adaptive, Data-Driven Course Design to Improve Student Learning
Implementing Adaptive, Data-Driven Course Design to Improve Student LearningJeremy Anderson
 
Lessons from Adopting an Adaptive Learning Platform
Lessons from Adopting an Adaptive Learning PlatformLessons from Adopting an Adaptive Learning Platform
Lessons from Adopting an Adaptive Learning PlatformJeremy Anderson
 

More from Jeremy Anderson (20)

Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
How I Lead and Manage
How I Lead and ManageHow I Lead and Manage
How I Lead and Manage
 
Creating a Print-on-Demand Initiative for Open Educational Resources
Creating a Print-on-Demand Initiative for Open Educational ResourcesCreating a Print-on-Demand Initiative for Open Educational Resources
Creating a Print-on-Demand Initiative for Open Educational Resources
 
How any institution can get started on learning analytics
How any institution can get started on learning analyticsHow any institution can get started on learning analytics
How any institution can get started on learning analytics
 
Creating Wraparound Supports for Students through Internal Partnerships
Creating Wraparound Supports for Students through Internal PartnershipsCreating Wraparound Supports for Students through Internal Partnerships
Creating Wraparound Supports for Students through Internal Partnerships
 
Four LMS Tools to Change Your Life
Four LMS Tools to Change Your LifeFour LMS Tools to Change Your Life
Four LMS Tools to Change Your Life
 
Addressing the Adjunct Underclass: Fit and Employment Outcomes in Part-Time F...
Addressing the Adjunct Underclass: Fit and Employment Outcomes in Part-Time F...Addressing the Adjunct Underclass: Fit and Employment Outcomes in Part-Time F...
Addressing the Adjunct Underclass: Fit and Employment Outcomes in Part-Time F...
 
Plan an OER Stragegic Plan: Change Models and Processes
Plan an OER Stragegic Plan: Change Models and ProcessesPlan an OER Stragegic Plan: Change Models and Processes
Plan an OER Stragegic Plan: Change Models and Processes
 
Sustaining Accessible OER at Scale
Sustaining Accessible OER at ScaleSustaining Accessible OER at Scale
Sustaining Accessible OER at Scale
 
Hybrid Course Design - Will It Blend?!
Hybrid Course Design - Will It Blend?!Hybrid Course Design - Will It Blend?!
Hybrid Course Design - Will It Blend?!
 
The Triple A (AAA) of OER: Accessibility, Availability, and Affordability
The Triple A (AAA) of OER: Accessibility, Availability, and AffordabilityThe Triple A (AAA) of OER: Accessibility, Availability, and Affordability
The Triple A (AAA) of OER: Accessibility, Availability, and Affordability
 
Case Study: Increasing Access through OER Adoption
Case Study: Increasing Access through OER AdoptionCase Study: Increasing Access through OER Adoption
Case Study: Increasing Access through OER Adoption
 
What data can deliver: A new way of operating
What data can deliver: A new way of operatingWhat data can deliver: A new way of operating
What data can deliver: A new way of operating
 
2018 Horizon Report Webinar: Adaptive Learning and OER at Scale
2018 Horizon Report Webinar: Adaptive Learning and OER at Scale2018 Horizon Report Webinar: Adaptive Learning and OER at Scale
2018 Horizon Report Webinar: Adaptive Learning and OER at Scale
 
The Nitty Gritty of OER Adoption
The Nitty Gritty of OER AdoptionThe Nitty Gritty of OER Adoption
The Nitty Gritty of OER Adoption
 
The Path to Creating an Integrated Online Contingent Faculty Competency System
The Path to Creating an Integrated Online Contingent Faculty Competency SystemThe Path to Creating an Integrated Online Contingent Faculty Competency System
The Path to Creating an Integrated Online Contingent Faculty Competency System
 
Strategies to Scale Adaptive Course Design
Strategies to Scale Adaptive Course DesignStrategies to Scale Adaptive Course Design
Strategies to Scale Adaptive Course Design
 
OER in the Time of Adaptive
OER in the Time of AdaptiveOER in the Time of Adaptive
OER in the Time of Adaptive
 
Implementing Adaptive, Data-Driven Course Design to Improve Student Learning
Implementing Adaptive, Data-Driven Course Design to Improve Student LearningImplementing Adaptive, Data-Driven Course Design to Improve Student Learning
Implementing Adaptive, Data-Driven Course Design to Improve Student Learning
 
Lessons from Adopting an Adaptive Learning Platform
Lessons from Adopting an Adaptive Learning PlatformLessons from Adopting an Adaptive Learning Platform
Lessons from Adopting an Adaptive Learning Platform
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 

Clustering Models to Assist in Student Outreach

  • 2. 1. Why clustering? 2. Example 1, hierarchical clustering 3. Questions 4. Overview of common algorithms and strengths/limitations 5. Example 2, K-means clustering 6. Questions 2
  • 3. 3
  • 4. 4 Reverse looking What are characteristics of students who used/did not use a service? Did/did not persist? Who is the prototypical student in each academic program? Forward looking What characteristics comprise our prospective student personas? How do risk factors overlap in some groups of students?
  • 5. 5
  • 6. The challenge Contrary to national trends, Dallas College has experienced larger than typical drops in female student re-enrollment patterns. Our goal was to stem decreases for the Fall 2021 semester through a text campaign. The question What messaging should be used to resonate with these students? 6
  • 7. 45k students Diverse features for the model • Demographics: race/ethnicity, gender, age • Financial: income, employment intensity • Household: household size, dependent count • Academics: last term of enrollment, credits, GPA • Special Pop membership 7
  • 8. 8 Step 1 Starts with a case as leaf node Each case goes into that leaf node or breaks into a new node Result is a Cluster Features tree Step 2 Leaf nodes are combined through agglomeration Result is several “best” clusters with a silhouette score
  • 9. 9
  • 10. 10 African Am. 3 dependents Part-time job Hispanic 4 dependents Part-time job Hispanic 2 dependents Full-time job White 1 dependent Full-time job
  • 11. 11
  • 12. BIRCH and Two-step (SPSS proprietary; Ex 1) K-means, K-modes, K-medoids (Ex 2) Hierarchical (Agglomerative and Divisive) 12
  • 13. 13
  • 14. 14
  • 15. 15 Clusters based on distance cutoff line Distance cutoff line
  • 16. Algorithm Variable Types BIRCH / 2-Step Any, multiple at the same time Hierarchical Any, but stick to one kind at a time and use well-paired distance measure K-means Continuous K-modes Categorical K-medoids Any, multiple at the same time with the right distance measure (Gower) 16
  • 17. 17 Algorithm +/- BIRCH / 2-Step Flexible variable types, fast compute, large data; there is an element of a black box Hierarchical Flexible modeling, highly explainable; limited to small data K-Means, K-Modes Easy, fast, large data; sensitive to K, curse of dimensionality, clusters same sized K-medoids Like K-Means but more flexible variable types and more costly tuning & compute
  • 18. 18
  • 19. • Population: 24 years and older adults enrolled for Fall 2021 • Features : Academics, Demographic, Financial , Household and Veteran status. • Mix of discrete and continuous variables with majority of them being categorical data. 19 Below Federal poverty Level Below Median Lower half Above Median Upper half Above Median Median household income in Texas Poverty Flag INCOME BIN
  • 20. 20 V1 V2 V3 V4 V5 0 1 1 0 1 V1 V2 V3 V4 V5 1 0 0 0 1 1+1+1+1 = 4 Dissimilarity measure 0: Mismatch 1: Match • Randomly select the K initial centers • Repeat 1. Assign the samples to nearest center 2. Update means/modes based on newly formed cluster 3. Calculate the cost (SSE/Sum of dissimilarity) • Stop when cluster centers converges Using frequency-based method to calculate mode instead of the mean of the sample Minimize the cost function Sample 1 Sample 2
  • 22. 22 Missing values for categorical variables such as employment status Missing values for continuous variables such as income Possible solutions • Revisit our data warehouse to obtain as much information as we can to fill in the missing values • Imputation with K-Nearest neighbors with Hamming distance for categorical data • Imputation with K-Nearest neighbors with Euclidean distance for continuous data
  • 23.
  • 24.
  • 25. Jeremy Anderson, Associate Vice Chancellor of Strategic Analytics, jeremy.anderson@dcccd.edu Dillon Lu, Data Analyst, DLu@dcccd.edu

Editor's Notes

  1. REFER to recent Dallas Morning News Article and its Framing Give brief overview – focused on 2 components – Data and Demographics by Core Populations AND Interventions to help support all students.
  2. You have a population in n dimensions of space Your model breaks that into subpopulations that are self-similar and distinct from other subpopulations There are many algorithms that attack this basic task in different ways
  3. The idea with all of these kinds of research question is that you would take a very large population and break it down into smaller groups. Then, you customize the messaging to fit the cluster personas. Used a lot in the marketing world to create customer personas, but has applications to any kind of “product” like academic programs and student services.
  4. The case count was fairly high. 45k+ female students had enrolled in prior three terms, had not attained a credential or transferred, and were not enrolled for the Fall. Some variables like income and household size are highly missing if we rely just on FAFSA because only about half of our students complete it. Instead, we supplement by using a homegrown Student Information Profile that goes out 1x a year and asks those and other questions. Part of data preparation is merging the two data streams, which also is a challenge because they are binned as categorical in the SIP but are continuous in FAFSA. The other variables are highly available and mostly clean, so very little cleanup was necessary. Other than that, we do some regular feature engineering around membership in special populations like athletes, international students, foster students, veterans, etc. That's code that we've written and reuse across projects to save some time. After the data prep, I stepped back and saw that the features for consideration were all mixed. Some were binary and others categorical or continuous. For that reason, and for the size of the data, I chose the two-step clustering algorithm in SPSS because it works well under both conditions and because, as a social scientist by training, I am most familiar with SPSS as an analysis tool. With the data ready, I then played around with the features to maximize the evaluation score I was seeing. Loaded them all in at once, then looked at the cluster overlap charts which is a feature in SPSS. Took out ones that didn’t have strong separation and was left with a collection of features that would be most useful/impactful for final modeling.
  5. The tool I chose to use was SPSS because that's my bread and butter as a social scientist by trade. It has a couple of built-in options for clustering. In step 1, the similarity score, based on the distance between the case and the node values, is the determining factor. If the distance is above the algorithm's threshold and the node would spread to wide to incorporate it, then the data point goes into a newly created node.  In step 2, the nodes from the tree start to get clumped together, also based on their distances. Lots of different cluster counts are tried and evaluated automatically with one of the available clustering criteria that you can choose from in the model settings in SPSS Ultimately, you get a set of the best clusters for the data. The model picks the number of clusters automatically through the Cluster Tree creation and the evaluation of the agglomerations.
  6. I got my silhouette measure that gave me a sense of how well the points in each cluster were self-similar and how much each cluster was different from the others. Score was ranging from 0.6 to 0.7, which falls in the good range Was able to click into the scoring for details. That’s the clusters matrix that shows the clusters as columns and the features used to build the clusters as rows Can click on one of the cells to see that feature’s overall distribution and what’s present in the cluster in the Cluster Distribution visual Can click multiple clusters to see how they compare in a visual way in the Cluster Comparison
  7. Hispanic, Black, or White, with 3-4 dependents, employed part-time Hispanic, 3 dependents, employed full-time Black or White, 1-2 dependents, employed full-time Overriding message was POC with more dependents, especially if employed less than full-time. The hypothesis there was that we should focus messaging on emergency aid Whereas full-time workers with fewer dependents might benefit from an awareness campaign about our childcare assistance. Really, though, we needed to test these assumptions. Used these clusters to then create a call list of ~350 students with the aim of talking to 50 (10-15 per cluster). The phone interviews were short, about 10 minutes, and guided by a set of template questions that our Student Success Research team and our student success staff worked on together. Results from the qualitative discussions left us with a collection of themes for each cluster and some prospective talking points to use in the text campaign. Ended up with 5600 students responding to the text campaign and re-enrolling, equating to an 18% response
  8. Before we take a quick tour of some other clustering algorithms and their strengths and weaknesses, let's pause for one or two comments or questions about our first example. We will also have some time at the end for other questions.
  9. Today, we're looking at three very common categories of clustering algorithms, though there are others that are applicable to other types of challenges. Already covered two-step, which is similar to BIRCH, in example 1. The point there is that it blends the approach of the other types of algorithms listed to get the best of both worlds. For comparison, then, we’ll look at these other two common approaches in the K-algorithms and hierarchical algorithm.
  10. The premise of all three of these is you pick a number of K (clusters). Rather than pick at random, two good ways to narrow in on the best K for the data are:  elbow method – for this it would be somewhere between 6 and 8 that you might want to try Silhouette Coefficient The cluster picking methods and the algorithms in general are available in Python's Scikit-learn package and they're in R, but it'll take a few packages
  11. So, how does it work, generally? K-means places points as the center of the clusters, whereas K-medoids chooses the most representative case. K-medoids is less sensitive to outliers, like using the median rather than the mean, so it may be preferable to K-means depending on your data K-mode, meanwhile, looks for the number of similarities between data points when considering the different features, but it still collapses all of this into a pre-determined number of clusters that you set.
  12. Match up pairs of cases based on distance between them Then match up pairs of pairs, also based on distance Pick what distance you’re okay with and use that to determine the clusters. That's where it's different from Two-Step and BIRCH which will do that automatically. Divisive flips this on its head and starts with all of the variables in one Available in Python scikit-learn and R
  13. Divide our student body into groups so we can look at each specific group more closely and identify what they need and what can we do to help them reach their educational goals. 
  14. K-mode clustering   Determine initial number of clusters K Minimize cost function while maintain relatively small value for K.  Purpose of study  is that we want to divide our student body into large groups so we can study each group more effectively therefore to have a better understanding of our student body as a whole. We prefer generalization rather than looking at specific cases. Choose 4 in this case and move on to the next page  
  15. This information can help us to develop strategic plans that focus on a particular group of student to help them to achieve their goals and be successful here at our institution. 
  16. Many students with missing income end up in the second income bin due to nature of our dataset design  Revisit our data and eliminate as many missing value as possible to obtain a more complete data should help greatly to have our students more evenly distributed in INCOME_BIN variable