SlideShare a Scribd company logo
1 of 21
Download to read offline
© 2019 Iver Band© 2019 Iver Band
Chronic Absenteeism Rate
Prediction (CARP)
Coursera Advanced Data Science with IBM Specialization
Capstone Project
Iver Band
December, 2019
© 2019 Iver Band© 2019 Iver Band
Agenda
● Use Case
● Source Data Sets
● Architectural Overview
● Data Quality Assessment
● Data Visualization
● Data Exploration
● Initial Feature Engineering
● Model Performance Indicator
● Core Concepts
● Algorithms
● Frameworks
● Feature Engineering Experiment
● Model Performance Evaluation
● Conclusion
© 2019 Iver Band
© 2019 Iver Band
Chronic Absenteeism Rate Prediction (CARP)
Use Case
● Chronic absenteeism occurs when a student in grades K-12 misses 10%
or more of the school year for any reason
● It is a strong predictor of low academic achievement.
● CARP is for data scientists supporting educational and social services
administrators and policymakers in answering the following questions:
○ What demographic factors predict the rate of chronic absenteeism?
○ Can we use demographic data to predict chronic absenteeism rates?
© 2019 Iver Band
© 2019 Iver Band
Source Data Sets
● 2018 Chronic Absenteeism Data from California Department of Education
○ Counts of total students and chronically absent K-12 students by census tract in Los
Angeles County, California, USA
● US Census Bureau American Community Survey 2013-2017
○ Statistics on potential predictors of 2018 chronic absenteeism rates:
Median Income Employment Status
Race Length of Commute
Educational Attainment Disability
Geographic Mobility Marital Status
Health Insurance Status Citizenship and Nativity
English Language Mastery
© 2019 Iver Band© 2019 Iver Band
data.census.gov www.cde.ca.gov
Personal Laptop
Dataplatform.ibm.com object store
CARP-ETL:
Join and
otherwise
prepare data
CARP-EXP:
Explore and
visualize data
CARP-DNN:
Define, train,
and test deep
neural network
regressor
CARP-DTE:
Define, train,
and test
AdaBoost
Decision Tree
Regressor
CARP-ME:
Evaluate and
compare model
performance
US Census Bureau California Dept of Education
Data Sources:
Jupyter
Notebooks:
Architectural Overview
© 2019 Iver Band© 2019 Iver Band
Data Quality
Assessment
● Determined
percentage of rows
with missing or non-
numeric data in
○ All 11 input
data sets prior
to cleansing
○ The joined data
set
● Primary issue is 84%
coverage of LA
County census tracts,
● Data about 2158
census tracts was
available for
modeling
© 2019 Iver Band© 2019 Iver Band
● Visualized distribution of
chronic absenteeism
percentages with
○ Histogram
○ Kernel Density
Estimate
○ Rug plot
● Distribution is roughly
normal, with positive
skewness
Data Visualization
© 2019 Iver Band© 2019 Iver Band
Data Visualization
● Visualized geographic
distribution of chronic
absenteeism rates with
a choropleth map
● Adjacent or close
census tracts tend to
have similar rates
© 2019 Iver Band© 2019 Iver Band
Data Exploration
● Visualized
relationship of top
15 correlates per
r2 score using pair
plots with
regression lines
● Read left to right,
then top to bottom
● Race, educational
attainment,
marital status and
income are the
strongest
predictors
● Prediction
strength declines
sharply after the
top five predictors
© 2019 Iver Band
© 2019 Iver Band
Initial Feature Engineering
● Converted counts into percentages in
○ California Department of Education chronic absenteeism data set
○ Nearly all of the American Community Survey data sets
● Joined all twelve data sets by six-digit US census tract number, which is
unique with each county
● For deep neural network model only
○ Calculated r2 for each predictor/target variable pair
○ Input only the top fifteen predictors
○ Scaled all data to have a mean of zero and a standard deviation of 1
© 2019 Iver Band© 2019 Iver Band
Model Performance Indicator
Coefficient of Determination (r2 )
● Measures the extent to which the predicted rates
reflect the total variability of the actual rates
● Disproportionately penalizes large errors
● Prevents errors with opposing signs from cancelling
each other out
Formula
© 2019 Iver Band© 2019 Iver Band
Core Concept: Neural Network
Feed Forward
Back Propagation
• Feed forward: prediction by computing the
value of each node, layer-by-layer, based on
multiplying the nodes in the previous layer by
a set of weights, adding the products and a
bias, and applying a nonlinear activation
function
• Back propagation: adjusting the weights and
bias by differentiating the feed forward
function to determine its slope, and applying
gradient descent (next slide).
Image from https://victorzhou.com/blog/intro-to-neural-networks/
© 2019 Iver Band© 2019 Iver Band
Core Concept: Gradient Descent for Neural
Network Model Training
• Imagine trying to find the deepest
valley in a deep fog
• Altitude is loss function, which
compares the predicted value to the
actual value
• The size of the step you take is the
learning rate
• Descending the valley is gradient
descent
• Your location in n-dimensional
space is determined by using model
weights and the loss function as
coordinates
• The model-fitting algorithm
repeatedly determines the gradient
(slope) of the loss and adjusts
model weights and biases to
descend the gradient
Text and picture adapted from
https://mc.ai/stochastic-gradient-descent-in-plain-english/
© 2019 Iver Band© 2019 Iver Band
Core Concept: CART* for Decision Tree Modeling
Do I play golf today? Yes or No?
…
Build next leaf with
decision criteria with
lowest Gini score:
Adapted from: https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/*Classification and Regression Trees
© 2019 Iver Band© 2019 Iver Band
Algorithms
Feed-Forward Neural Network
● Dimensions of each layer
○ 15-->11-->7-->1
● RELU Activation for all nodes
● f(x) = 0 if x <=0
● f(x) = x otherwise
● Adam Optimizer
○ 0.01 learning rate
○ 0.001 decay
○ Mean squared error loss metric
● Batch size = 12
● 25-40 epochs (passes through data)
● Early stopping for training runs that don’t
converge
● Attributes chosen after wide experimentation
Decision Tree Ensemble
● AdaBoost with decision tree as weak
learner
o Repeatedly builds decision trees,
weighting them to improve weakest
predictions
o Squared error loss metric
● Method chosen after experimentation with
linear regression and standalone decision
tree regression, as well as other decision
tree ensembles, such as random forest
© 2019 Iver Band© 2019 Iver Band
Frameworks
All Python-based
Purpose Frameworks
Interactive Development Jupyter within IBM Watson Studio
Data Manipulation and Computation Numpy, Pandas, Geopandas
Data Visualization Matplotlib, Seaborn, Descartes
Neural Network Modeling Keras with TensorFlow backend
Data Scaling, Model Scoring and Decision
Tree Ensemble Modeling
Scikit-Learn
© 2019 Iver Band
© 2019 Iver Band
Feature Engineering Experiment
● Choropleth map shows adjacent census tracts tend to have similar chronic
absenteeism rates
● Many adjacent or nearby census tracts in LA County appear to have similar
or sequential census tract numbers
● Therefore, would including the six-digit census tract number as a predictor
variable, in addition to its role as a key, improve model performance?
● Unfortunately not. The census tract number is among the weakest of the
predictor variables examined, based on
○ The pairwise r2 score
○ No performance change for the Decision Tree Ensemble model
© 2019 Iver Band© 2019 Iver Band
Model Performance Evaluation
● Tested both models with five
rounds of randomized shuffle-split
with 80/20 train-test ratio
● Chose randomized shuffle-split over
cross-validation to retain finer
control over size of split
○ Before tuning, got good
performance only with 90/10
train-test split
○ Future work could use cross-
validation
● The Decision Tree Ensemble is
clearly a stronger predictor than
the Neural Network
● To avoid outliers, chose iteration
with median performance of
stronger model for further
evaluation
© 2019 Iver Band© 2019 Iver Band
Model Performance Evaluation
● Compared residuals of predictions
using superimposed histograms
● For about 90% of test cases in its
median-performing iteration,
Decision Tree Ensemble predicted
chronic absenteeism rates within
5%
© 2019 Iver Band© 2019 Iver Band
Model Performance Evaluation
● Used a scatter plot with an
identity and +/- 5% lines to
visualize deviations of
predicted chronic absenteeism
rates from actual values
● Result is consistent with
residual histograms
© 2019 Iver Band
© 2019 Iver Band
Conclusion
● What demographic factors predict the rate of chronic absenteeism?
○ For the years studied, race, educational attainment, marital status and income are the
strongest predictors of chronic absenteeism in LA County, California, USA
● Can we use public demographic data to predict chronic absenteeism rates?
○ Yes, for the years studied, well enough to identify areas with highest future risk and,
perhaps, implement mitigating measures
● Opportunities for future work
○ Expand scope to include additional geographic areas, years, and predictor variables
○ Further automate data extraction and transformation
○ Consider additional algorithms, e.g. PCA, XGBoost, Polynomial Regression
○ Explore outliers where rate is more or less than expected
○ Explore effects of programs to mitigate chronic absenteeism
o Links
https://github.com/IverBand/carp
https://www.coursera.org/specializations/advanced-data-science-ibm

More Related Content

Similar to Chronic Absenteeism Rate Prediction: A Data Science Case Study

Program eval webinar final v2
Program eval webinar final v2Program eval webinar final v2
Program eval webinar final v2Nptimes
 
IRJET- Movie Success Prediction using Popularity Factor from Social Media
IRJET- Movie Success Prediction using Popularity Factor from Social MediaIRJET- Movie Success Prediction using Popularity Factor from Social Media
IRJET- Movie Success Prediction using Popularity Factor from Social MediaIRJET Journal
 
Predictive Analytics
Predictive AnalyticsPredictive Analytics
Predictive AnalyticsNUS-ISS
 
SNG Minnesota White Paper
SNG   Minnesota White PaperSNG   Minnesota White Paper
SNG Minnesota White PaperAnn Treacy
 
Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...
Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...
Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...Hendri Karisma
 
2009 FUSION Presentation - Student Survey Data
2009 FUSION Presentation - Student Survey Data2009 FUSION Presentation - Student Survey Data
2009 FUSION Presentation - Student Survey DataBarry Dahl
 
Barga Data Science lecture 6
Barga Data Science lecture 6Barga Data Science lecture 6
Barga Data Science lecture 6Roger Barga
 
Fraud Detection in Financial Services using Graph Analysis and Machine Learning
Fraud Detection in Financial Services using Graph Analysis and Machine LearningFraud Detection in Financial Services using Graph Analysis and Machine Learning
Fraud Detection in Financial Services using Graph Analysis and Machine LearningThomas Teske
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with Regoodwintx
 
2015_DSSG_high_school_Poster
2015_DSSG_high_school_Poster2015_DSSG_high_school_Poster
2015_DSSG_high_school_PosterReid Johnson
 
Big data solutions for smallholder farmers in Southeast Asia: machine learnin...
Big data solutions for smallholder farmers in Southeast Asia: machine learnin...Big data solutions for smallholder farmers in Southeast Asia: machine learnin...
Big data solutions for smallholder farmers in Southeast Asia: machine learnin...Sustainable Cassava Disease Solutions Asia
 
IRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET - Rainfall Forecasting using Weka Data Mining ToolIRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET - Rainfall Forecasting using Weka Data Mining ToolIRJET Journal
 
Homework gap presentation
Homework gap presentationHomework gap presentation
Homework gap presentationEducationNC
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in MalaysiaAhmed Elmalla
 
IRJET- Facial Age Estimation with Age Difference
IRJET-  	  Facial Age Estimation with Age DifferenceIRJET-  	  Facial Age Estimation with Age Difference
IRJET- Facial Age Estimation with Age DifferenceIRJET Journal
 

Similar to Chronic Absenteeism Rate Prediction: A Data Science Case Study (20)

Program eval webinar final v2
Program eval webinar final v2Program eval webinar final v2
Program eval webinar final v2
 
IRJET- Movie Success Prediction using Popularity Factor from Social Media
IRJET- Movie Success Prediction using Popularity Factor from Social MediaIRJET- Movie Success Prediction using Popularity Factor from Social Media
IRJET- Movie Success Prediction using Popularity Factor from Social Media
 
Practical Machine Learning at Work
Practical Machine Learning at WorkPractical Machine Learning at Work
Practical Machine Learning at Work
 
Predictive Analytics
Predictive AnalyticsPredictive Analytics
Predictive Analytics
 
SNG Minnesota White Paper
SNG   Minnesota White PaperSNG   Minnesota White Paper
SNG Minnesota White Paper
 
Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...
Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...
Comparison Study of Neural Network and Deep Neural Network on Repricing GAP P...
 
OOD_PPT.pptx
OOD_PPT.pptxOOD_PPT.pptx
OOD_PPT.pptx
 
Final Project Statr 503
Final Project Statr 503Final Project Statr 503
Final Project Statr 503
 
2009 FUSION Presentation - Student Survey Data
2009 FUSION Presentation - Student Survey Data2009 FUSION Presentation - Student Survey Data
2009 FUSION Presentation - Student Survey Data
 
Barga Data Science lecture 6
Barga Data Science lecture 6Barga Data Science lecture 6
Barga Data Science lecture 6
 
Fraud Detection in Financial Services using Graph Analysis and Machine Learning
Fraud Detection in Financial Services using Graph Analysis and Machine LearningFraud Detection in Financial Services using Graph Analysis and Machine Learning
Fraud Detection in Financial Services using Graph Analysis and Machine Learning
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
random forest.pptx
random forest.pptxrandom forest.pptx
random forest.pptx
 
2015_DSSG_high_school_Poster
2015_DSSG_high_school_Poster2015_DSSG_high_school_Poster
2015_DSSG_high_school_Poster
 
Big data solutions for smallholder farmers in Southeast Asia: machine learnin...
Big data solutions for smallholder farmers in Southeast Asia: machine learnin...Big data solutions for smallholder farmers in Southeast Asia: machine learnin...
Big data solutions for smallholder farmers in Southeast Asia: machine learnin...
 
IRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET - Rainfall Forecasting using Weka Data Mining ToolIRJET - Rainfall Forecasting using Weka Data Mining Tool
IRJET - Rainfall Forecasting using Weka Data Mining Tool
 
Homework gap presentation
Homework gap presentationHomework gap presentation
Homework gap presentation
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
 
IRJET- Facial Age Estimation with Age Difference
IRJET-  	  Facial Age Estimation with Age DifferenceIRJET-  	  Facial Age Estimation with Age Difference
IRJET- Facial Age Estimation with Age Difference
 
Regression Discontinuity Method
Regression Discontinuity MethodRegression Discontinuity Method
Regression Discontinuity Method
 

More from Iver Band

Enhancing Organizational Performance by Creating a Culture of Stewardship wit...
Enhancing Organizational Performance by Creating a Culture of Stewardship wit...Enhancing Organizational Performance by Creating a Culture of Stewardship wit...
Enhancing Organizational Performance by Creating a Culture of Stewardship wit...Iver Band
 
What Can We Do With The ArchiMate Language?
What Can We Do With The ArchiMate Language?What Can We Do With The ArchiMate Language?
What Can We Do With The ArchiMate Language?Iver Band
 
The ArchiMate Language for Enterprise and Solution Architecture
The ArchiMate Language for Enterprise and Solution ArchitectureThe ArchiMate Language for Enterprise and Solution Architecture
The ArchiMate Language for Enterprise and Solution ArchitectureIver Band
 
Cloud architecture with the ArchiMate Language
Cloud architecture with the ArchiMate LanguageCloud architecture with the ArchiMate Language
Cloud architecture with the ArchiMate LanguageIver Band
 
Modeling Big Data with the ArchiMate 3.0 Language
Modeling Big Data with the ArchiMate 3.0 LanguageModeling Big Data with the ArchiMate 3.0 Language
Modeling Big Data with the ArchiMate 3.0 LanguageIver Band
 
ArchiMate 3.0: A New Standard for Architecture
ArchiMate 3.0: A New Standard for ArchitectureArchiMate 3.0: A New Standard for Architecture
ArchiMate 3.0: A New Standard for ArchitectureIver Band
 
An Introduction to the ArchiMate 3.0 Specification
An Introduction to the ArchiMate 3.0 SpecificationAn Introduction to the ArchiMate 3.0 Specification
An Introduction to the ArchiMate 3.0 SpecificationIver Band
 
Modeling and Evolving a Web Portal with the TOGAF Framework and the ArchiMate...
Modeling and Evolving a Web Portal with the TOGAF Framework and the ArchiMate...Modeling and Evolving a Web Portal with the TOGAF Framework and the ArchiMate...
Modeling and Evolving a Web Portal with the TOGAF Framework and the ArchiMate...Iver Band
 
An Introduction to Enterprise Architecture Visual Modeling With The ArchiMate...
An Introduction to Enterprise Architecture Visual Modeling With The ArchiMate...An Introduction to Enterprise Architecture Visual Modeling With The ArchiMate...
An Introduction to Enterprise Architecture Visual Modeling With The ArchiMate...Iver Band
 
Using the TOGAF® 9.1 Framework with the ArchiMate® 2.1 Modeling Language
Using the TOGAF® 9.1 Framework with the ArchiMate® 2.1 Modeling LanguageUsing the TOGAF® 9.1 Framework with the ArchiMate® 2.1 Modeling Language
Using the TOGAF® 9.1 Framework with the ArchiMate® 2.1 Modeling LanguageIver Band
 
Always-On Services for Consumer Web, Mobile and the Internet of Things
Always-On Services for Consumer Web, Mobile and the Internet of ThingsAlways-On Services for Consumer Web, Mobile and the Internet of Things
Always-On Services for Consumer Web, Mobile and the Internet of ThingsIver Band
 
Effective Strategy Execution with Capability-Based Planning, Enterprise Arch...
Effective Strategy Execution with Capability-Based Planning, Enterprise Arch...Effective Strategy Execution with Capability-Based Planning, Enterprise Arch...
Effective Strategy Execution with Capability-Based Planning, Enterprise Arch...Iver Band
 
Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...
Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...
Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...Iver Band
 
Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...
Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...
Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...Iver Band
 
Modeling Enterprise Risk Management and Security with the ArchiMate Language
Modeling Enterprise Risk Management and Security with the ArchiMate LanguageModeling Enterprise Risk Management and Security with the ArchiMate Language
Modeling Enterprise Risk Management and Security with the ArchiMate LanguageIver Band
 
Guiding Agile Solution Delivery with the ArchiMate Language
Guiding Agile Solution Delivery with the ArchiMate LanguageGuiding Agile Solution Delivery with the ArchiMate Language
Guiding Agile Solution Delivery with the ArchiMate LanguageIver Band
 
Enterprise Architecture with the Zachman Framework and the Archimate Language
Enterprise Architecture with the Zachman Framework and the Archimate LanguageEnterprise Architecture with the Zachman Framework and the Archimate Language
Enterprise Architecture with the Zachman Framework and the Archimate LanguageIver Band
 
Book Review: Making Technology Investments Profitable
Book Review:  Making Technology Investments ProfitableBook Review:  Making Technology Investments Profitable
Book Review: Making Technology Investments ProfitableIver Band
 
From Capability-Based Planning to Competitive Advantage: Assembling Your Bus...
From Capability-Based Planning to Competitive Advantage:  Assembling Your Bus...From Capability-Based Planning to Competitive Advantage:  Assembling Your Bus...
From Capability-Based Planning to Competitive Advantage: Assembling Your Bus...Iver Band
 
Thought Leader Interview: Dr. William Turner on the Software­-Defined Future ...
Thought Leader Interview: Dr. William Turner on the Software­-Defined Future ...Thought Leader Interview: Dr. William Turner on the Software­-Defined Future ...
Thought Leader Interview: Dr. William Turner on the Software­-Defined Future ...Iver Band
 

More from Iver Band (20)

Enhancing Organizational Performance by Creating a Culture of Stewardship wit...
Enhancing Organizational Performance by Creating a Culture of Stewardship wit...Enhancing Organizational Performance by Creating a Culture of Stewardship wit...
Enhancing Organizational Performance by Creating a Culture of Stewardship wit...
 
What Can We Do With The ArchiMate Language?
What Can We Do With The ArchiMate Language?What Can We Do With The ArchiMate Language?
What Can We Do With The ArchiMate Language?
 
The ArchiMate Language for Enterprise and Solution Architecture
The ArchiMate Language for Enterprise and Solution ArchitectureThe ArchiMate Language for Enterprise and Solution Architecture
The ArchiMate Language for Enterprise and Solution Architecture
 
Cloud architecture with the ArchiMate Language
Cloud architecture with the ArchiMate LanguageCloud architecture with the ArchiMate Language
Cloud architecture with the ArchiMate Language
 
Modeling Big Data with the ArchiMate 3.0 Language
Modeling Big Data with the ArchiMate 3.0 LanguageModeling Big Data with the ArchiMate 3.0 Language
Modeling Big Data with the ArchiMate 3.0 Language
 
ArchiMate 3.0: A New Standard for Architecture
ArchiMate 3.0: A New Standard for ArchitectureArchiMate 3.0: A New Standard for Architecture
ArchiMate 3.0: A New Standard for Architecture
 
An Introduction to the ArchiMate 3.0 Specification
An Introduction to the ArchiMate 3.0 SpecificationAn Introduction to the ArchiMate 3.0 Specification
An Introduction to the ArchiMate 3.0 Specification
 
Modeling and Evolving a Web Portal with the TOGAF Framework and the ArchiMate...
Modeling and Evolving a Web Portal with the TOGAF Framework and the ArchiMate...Modeling and Evolving a Web Portal with the TOGAF Framework and the ArchiMate...
Modeling and Evolving a Web Portal with the TOGAF Framework and the ArchiMate...
 
An Introduction to Enterprise Architecture Visual Modeling With The ArchiMate...
An Introduction to Enterprise Architecture Visual Modeling With The ArchiMate...An Introduction to Enterprise Architecture Visual Modeling With The ArchiMate...
An Introduction to Enterprise Architecture Visual Modeling With The ArchiMate...
 
Using the TOGAF® 9.1 Framework with the ArchiMate® 2.1 Modeling Language
Using the TOGAF® 9.1 Framework with the ArchiMate® 2.1 Modeling LanguageUsing the TOGAF® 9.1 Framework with the ArchiMate® 2.1 Modeling Language
Using the TOGAF® 9.1 Framework with the ArchiMate® 2.1 Modeling Language
 
Always-On Services for Consumer Web, Mobile and the Internet of Things
Always-On Services for Consumer Web, Mobile and the Internet of ThingsAlways-On Services for Consumer Web, Mobile and the Internet of Things
Always-On Services for Consumer Web, Mobile and the Internet of Things
 
Effective Strategy Execution with Capability-Based Planning, Enterprise Arch...
Effective Strategy Execution with Capability-Based Planning, Enterprise Arch...Effective Strategy Execution with Capability-Based Planning, Enterprise Arch...
Effective Strategy Execution with Capability-Based Planning, Enterprise Arch...
 
Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...
Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...
Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...
 
Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...
Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...
Thought Leader Interview: Atefeh Riazi on the Past, Present and Future of Met...
 
Modeling Enterprise Risk Management and Security with the ArchiMate Language
Modeling Enterprise Risk Management and Security with the ArchiMate LanguageModeling Enterprise Risk Management and Security with the ArchiMate Language
Modeling Enterprise Risk Management and Security with the ArchiMate Language
 
Guiding Agile Solution Delivery with the ArchiMate Language
Guiding Agile Solution Delivery with the ArchiMate LanguageGuiding Agile Solution Delivery with the ArchiMate Language
Guiding Agile Solution Delivery with the ArchiMate Language
 
Enterprise Architecture with the Zachman Framework and the Archimate Language
Enterprise Architecture with the Zachman Framework and the Archimate LanguageEnterprise Architecture with the Zachman Framework and the Archimate Language
Enterprise Architecture with the Zachman Framework and the Archimate Language
 
Book Review: Making Technology Investments Profitable
Book Review:  Making Technology Investments ProfitableBook Review:  Making Technology Investments Profitable
Book Review: Making Technology Investments Profitable
 
From Capability-Based Planning to Competitive Advantage: Assembling Your Bus...
From Capability-Based Planning to Competitive Advantage:  Assembling Your Bus...From Capability-Based Planning to Competitive Advantage:  Assembling Your Bus...
From Capability-Based Planning to Competitive Advantage: Assembling Your Bus...
 
Thought Leader Interview: Dr. William Turner on the Software­-Defined Future ...
Thought Leader Interview: Dr. William Turner on the Software­-Defined Future ...Thought Leader Interview: Dr. William Turner on the Software­-Defined Future ...
Thought Leader Interview: Dr. William Turner on the Software­-Defined Future ...
 

Recently uploaded

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Chronic Absenteeism Rate Prediction: A Data Science Case Study

  • 1. © 2019 Iver Band© 2019 Iver Band Chronic Absenteeism Rate Prediction (CARP) Coursera Advanced Data Science with IBM Specialization Capstone Project Iver Band December, 2019
  • 2. © 2019 Iver Band© 2019 Iver Band Agenda ● Use Case ● Source Data Sets ● Architectural Overview ● Data Quality Assessment ● Data Visualization ● Data Exploration ● Initial Feature Engineering ● Model Performance Indicator ● Core Concepts ● Algorithms ● Frameworks ● Feature Engineering Experiment ● Model Performance Evaluation ● Conclusion
  • 3. © 2019 Iver Band © 2019 Iver Band Chronic Absenteeism Rate Prediction (CARP) Use Case ● Chronic absenteeism occurs when a student in grades K-12 misses 10% or more of the school year for any reason ● It is a strong predictor of low academic achievement. ● CARP is for data scientists supporting educational and social services administrators and policymakers in answering the following questions: ○ What demographic factors predict the rate of chronic absenteeism? ○ Can we use demographic data to predict chronic absenteeism rates?
  • 4. © 2019 Iver Band © 2019 Iver Band Source Data Sets ● 2018 Chronic Absenteeism Data from California Department of Education ○ Counts of total students and chronically absent K-12 students by census tract in Los Angeles County, California, USA ● US Census Bureau American Community Survey 2013-2017 ○ Statistics on potential predictors of 2018 chronic absenteeism rates: Median Income Employment Status Race Length of Commute Educational Attainment Disability Geographic Mobility Marital Status Health Insurance Status Citizenship and Nativity English Language Mastery
  • 5. © 2019 Iver Band© 2019 Iver Band data.census.gov www.cde.ca.gov Personal Laptop Dataplatform.ibm.com object store CARP-ETL: Join and otherwise prepare data CARP-EXP: Explore and visualize data CARP-DNN: Define, train, and test deep neural network regressor CARP-DTE: Define, train, and test AdaBoost Decision Tree Regressor CARP-ME: Evaluate and compare model performance US Census Bureau California Dept of Education Data Sources: Jupyter Notebooks: Architectural Overview
  • 6. © 2019 Iver Band© 2019 Iver Band Data Quality Assessment ● Determined percentage of rows with missing or non- numeric data in ○ All 11 input data sets prior to cleansing ○ The joined data set ● Primary issue is 84% coverage of LA County census tracts, ● Data about 2158 census tracts was available for modeling
  • 7. © 2019 Iver Band© 2019 Iver Band ● Visualized distribution of chronic absenteeism percentages with ○ Histogram ○ Kernel Density Estimate ○ Rug plot ● Distribution is roughly normal, with positive skewness Data Visualization
  • 8. © 2019 Iver Band© 2019 Iver Band Data Visualization ● Visualized geographic distribution of chronic absenteeism rates with a choropleth map ● Adjacent or close census tracts tend to have similar rates
  • 9. © 2019 Iver Band© 2019 Iver Band Data Exploration ● Visualized relationship of top 15 correlates per r2 score using pair plots with regression lines ● Read left to right, then top to bottom ● Race, educational attainment, marital status and income are the strongest predictors ● Prediction strength declines sharply after the top five predictors
  • 10. © 2019 Iver Band © 2019 Iver Band Initial Feature Engineering ● Converted counts into percentages in ○ California Department of Education chronic absenteeism data set ○ Nearly all of the American Community Survey data sets ● Joined all twelve data sets by six-digit US census tract number, which is unique with each county ● For deep neural network model only ○ Calculated r2 for each predictor/target variable pair ○ Input only the top fifteen predictors ○ Scaled all data to have a mean of zero and a standard deviation of 1
  • 11. © 2019 Iver Band© 2019 Iver Band Model Performance Indicator Coefficient of Determination (r2 ) ● Measures the extent to which the predicted rates reflect the total variability of the actual rates ● Disproportionately penalizes large errors ● Prevents errors with opposing signs from cancelling each other out Formula
  • 12. © 2019 Iver Band© 2019 Iver Band Core Concept: Neural Network Feed Forward Back Propagation • Feed forward: prediction by computing the value of each node, layer-by-layer, based on multiplying the nodes in the previous layer by a set of weights, adding the products and a bias, and applying a nonlinear activation function • Back propagation: adjusting the weights and bias by differentiating the feed forward function to determine its slope, and applying gradient descent (next slide). Image from https://victorzhou.com/blog/intro-to-neural-networks/
  • 13. © 2019 Iver Band© 2019 Iver Band Core Concept: Gradient Descent for Neural Network Model Training • Imagine trying to find the deepest valley in a deep fog • Altitude is loss function, which compares the predicted value to the actual value • The size of the step you take is the learning rate • Descending the valley is gradient descent • Your location in n-dimensional space is determined by using model weights and the loss function as coordinates • The model-fitting algorithm repeatedly determines the gradient (slope) of the loss and adjusts model weights and biases to descend the gradient Text and picture adapted from https://mc.ai/stochastic-gradient-descent-in-plain-english/
  • 14. © 2019 Iver Band© 2019 Iver Band Core Concept: CART* for Decision Tree Modeling Do I play golf today? Yes or No? … Build next leaf with decision criteria with lowest Gini score: Adapted from: https://sefiks.com/2018/08/27/a-step-by-step-cart-decision-tree-example/*Classification and Regression Trees
  • 15. © 2019 Iver Band© 2019 Iver Band Algorithms Feed-Forward Neural Network ● Dimensions of each layer ○ 15-->11-->7-->1 ● RELU Activation for all nodes ● f(x) = 0 if x <=0 ● f(x) = x otherwise ● Adam Optimizer ○ 0.01 learning rate ○ 0.001 decay ○ Mean squared error loss metric ● Batch size = 12 ● 25-40 epochs (passes through data) ● Early stopping for training runs that don’t converge ● Attributes chosen after wide experimentation Decision Tree Ensemble ● AdaBoost with decision tree as weak learner o Repeatedly builds decision trees, weighting them to improve weakest predictions o Squared error loss metric ● Method chosen after experimentation with linear regression and standalone decision tree regression, as well as other decision tree ensembles, such as random forest
  • 16. © 2019 Iver Band© 2019 Iver Band Frameworks All Python-based Purpose Frameworks Interactive Development Jupyter within IBM Watson Studio Data Manipulation and Computation Numpy, Pandas, Geopandas Data Visualization Matplotlib, Seaborn, Descartes Neural Network Modeling Keras with TensorFlow backend Data Scaling, Model Scoring and Decision Tree Ensemble Modeling Scikit-Learn
  • 17. © 2019 Iver Band © 2019 Iver Band Feature Engineering Experiment ● Choropleth map shows adjacent census tracts tend to have similar chronic absenteeism rates ● Many adjacent or nearby census tracts in LA County appear to have similar or sequential census tract numbers ● Therefore, would including the six-digit census tract number as a predictor variable, in addition to its role as a key, improve model performance? ● Unfortunately not. The census tract number is among the weakest of the predictor variables examined, based on ○ The pairwise r2 score ○ No performance change for the Decision Tree Ensemble model
  • 18. © 2019 Iver Band© 2019 Iver Band Model Performance Evaluation ● Tested both models with five rounds of randomized shuffle-split with 80/20 train-test ratio ● Chose randomized shuffle-split over cross-validation to retain finer control over size of split ○ Before tuning, got good performance only with 90/10 train-test split ○ Future work could use cross- validation ● The Decision Tree Ensemble is clearly a stronger predictor than the Neural Network ● To avoid outliers, chose iteration with median performance of stronger model for further evaluation
  • 19. © 2019 Iver Band© 2019 Iver Band Model Performance Evaluation ● Compared residuals of predictions using superimposed histograms ● For about 90% of test cases in its median-performing iteration, Decision Tree Ensemble predicted chronic absenteeism rates within 5%
  • 20. © 2019 Iver Band© 2019 Iver Band Model Performance Evaluation ● Used a scatter plot with an identity and +/- 5% lines to visualize deviations of predicted chronic absenteeism rates from actual values ● Result is consistent with residual histograms
  • 21. © 2019 Iver Band © 2019 Iver Band Conclusion ● What demographic factors predict the rate of chronic absenteeism? ○ For the years studied, race, educational attainment, marital status and income are the strongest predictors of chronic absenteeism in LA County, California, USA ● Can we use public demographic data to predict chronic absenteeism rates? ○ Yes, for the years studied, well enough to identify areas with highest future risk and, perhaps, implement mitigating measures ● Opportunities for future work ○ Expand scope to include additional geographic areas, years, and predictor variables ○ Further automate data extraction and transformation ○ Consider additional algorithms, e.g. PCA, XGBoost, Polynomial Regression ○ Explore outliers where rate is more or less than expected ○ Explore effects of programs to mitigate chronic absenteeism o Links https://github.com/IverBand/carp https://www.coursera.org/specializations/advanced-data-science-ibm