Winning Lipscomb University Eighth Annual Student Scholar Symposium presentation within the graduate students School of Information Technology and Computing group.
2. Obesity is a
Serious Problem
Obesity is weight that is higher than what is
considered as a healthy weight for a given height
Nearly 40% of adults (approximately 94 million)
Nearly 20% of children (approximately 13 million)
Remarkably, according to the USDA, it is getting
worse— quickly
USDA studies indicate that by 2030 nearly one-half of
the U.S. population will be obese
Obesity is linked with undesirable medical issues,
including:
Heart disease
Diabetes
Cancer
The costs are also staggering with estimates of $147
billion per year
3. Research Goals
Determine if machine learning (ML) could be used to predict the obesity of
an out-of-sample observation.
If ML works, develop the most performant model(s):
Most performant single model
Most performant ensemble
Interpret the relationship between (features) environmental factors and
obesity (target)
The astute amongst you have already identified a dilemma:
Generally the most performant models are the least interpretable
We pursued a dual modeling approach
4. The Data
Source: United States Department of Agriculture (USDA)
Geo-centric, each observation is a state and county combination.
Wide variety of features from a broad spectrum of perspectives:
access and proximity to grocery stores
restaurant availability and expenditures
food assistance
food prices and taxes
health and physical activity
socioeconomic characteristics
Wrangling:
Removed obvious sources of data leakage
Removed rows & columns missing over 90% of data
Missing values imputed with median
Few true strings. Coerced some features into numeric dtypes in pandas.
5. Interpretation
Research goal was to
understand relationship
between features and target
There is no one perfect method
Fisher score was selected due to
broad usage and general good
performance
6. Performance
Research goal was to develop
most performant models:
Single model
Ensemble
LogicPlum leveraged to focus
on the findings
7. Performance
Research goal was to develop
most performant models:
Single model
Ensemble
LogicPlum leveraged to focus
on the findings