SlideShare a Scribd company logo
1 of 7
Download to read offline
Social Capital Deserts: Obesity Surveillance using a
Location-Based Social Network
Hongyang Bai
University of Illinois at
Urbana-Champaign
hbai4@illinois.edu
Rumi Chunara
New York University
rumi.chunara@nyu.edu
Lav R. Varshney
University of Illinois at
Urbana-Champaign
varshney@illinois.edu
ABSTRACT
There is emerging evidence showing that strong community
structures, measurable via social capital metrics, are bene-
ficial for health outcomes. Here we demonstrate an associ-
ation between obesity rates in neighborhoods with check-in
and venue data from a location-based social network. We
first describe the possibility of using social media data to
dierctly measure social capital, and then describe how to
connect public health outcomes such as obesity to these new
social capital measures. By using the Foursquare API, venue
information including name, address, coordinates, number
of check-ins, etc. are collected for New York City: informa-
tion on 50,000 venues and over 15,000,000 check-ins were
streamed within two months. The collected data is general-
ized and categorized into a two-by-two matrix according to
location and category type; venue categories are created in a
hierarchical fashion based on ones provided by Foursquare,
e.g. “food,” “event,” “residence,” “hotel,” etc. We show the
number of venues in neighborhoods of several categories are
correlated with obesity rate, either positively or negatively.
In particular, there are social capital deserts linked with
greater obesity rates. Broadly, this work finds that location-
based social networks can be used for surveillance of social
capital, which is associated with prevalence of chronic dis-
eases of energy imbalance. Results can be used to improve
public health by targeting social capital interventions at the
fine grain measurable through social media.
Categories and Subject Descriptors
H.4 [Information Systems Applications]: Miscellaneous
Keywords
intervention prioritization, location-based social media, neigh-
borhoods, New York City, obesity, public health surveil-
lance, social capital, social good
1. INTRODUCTION
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Data for Good Exchange 2015 New York, New York USA
Copyright 2015 ACM X-XXXXX-XX-X/XX/XX ...$15.00.
To directly measure aspects of social environments that
may influence health outcomes, without recall bias and at
spatiotemporally fine grain, we propose to perform public
health surveillance through ubiquitous data generated by a
location-based social network (LBSN) such as Foursquare.1
Fine-grained health surveillance may allow organizations to
raise the global quality of life by pinpointing where inter-
ventions are most needed. Indeed, prioritization using data
analytics is crucial whenever there are limited resources [20].
We focus on chronic diseases like obesity that have several
deleterious health effects [5]; obesity is a known risk factor
for conditions including diabetes, high blood pressure, heart
disease, and sleep apnea.
The recent rapid rise in obesity prevalence indicates not
only biological factors but also environmental factors. It is
generally understood to be caused by behavioral choices re-
garding nutrition and exercise; rises in obesity are explained
by changes in availability of foods and the emergence of pre-
dominantly sedentary lifestyles [1]. Since the known envi-
ronmental risk factors are modifiable, policies interventions
have focused on modifying obesogenic environments where
high-calorie foods are over-abundant, time is restricted, and
exercise is optional rather than obligatory. Approaches in-
clude price-based interventions, such as a sugar-sweetened
beverage tax, and place-based policies to increase the avail-
ability of higher-quality foods in lower-income areas. Em-
pirically, it is found that nutritional bundles are largely un-
affected by the local purchasing environment, but taxation
can have a strong effect [11].
Beyond the built environment, social environment has also
been indicated in health outcomes such as obesity, seen for
example through the existence of obesity contagions [6]. In
particular, discussions of social capital focus on how so-
cial relations have productive benefits, whether for economic
prosperity, happiness, or health. Putnam suggests that “As
a rough rule of thumb, if you belong to no groups but decide
to join one, you cut your risk of dying over the next year in
half” [18].
Previous public health and epidemiology studies have demon-
strated associations between social capital and obesity preva-
lence. State-level correlational analysis demonstrates social
capital (operationalized by Putnam’s measure, Table 2) has
significant bivariate relationship with rates of obesity and
diabetes, as reported by the Centers for Disease Control
and Prevention Behavioral Risk Factor Surveillance System
(BRFSS), explaining 10% of the variance in obesity and 44%
of the variance in diabetes [12]. State- and county-level
1
https://foursquare.com/
correlational analysis also found significant relationships be-
tween social capital measures and prevalence of obesity as
well as physical inactivity [13]. A similar state-level study
reported social capital affects obesity through the promotion
of weight-control efforts [21].
Pushing the relevance of social capital from the collective
to the individual, associations have also been found relating
individual trust, participation, and social capital with waist
circumference and body mass index [15], even when control-
ling for socioeconomic status and lifestyle factors [16]. Since
there may be associations between social capital metrics and
chronic disease prevalence, surveillance of one may provide
surveillance of the other.
Social surveys such as BRFSS and the EpiQuery Com-
munity Health Survey in New York City (NYC) measure
public health, but have limited granularity and frequency
due to the expense of data collection. Furthermore, survey-
based methodologies inherently suffer from recall bias and
other biases. Indeed, biases in retrospective surveys have
been a longstanding limitation to measuring social environ-
ments [17].
To finely measure social environments directly without
cognitive bias, we use data that is immediately geocoded
and logged by Foursquare. In particular, we draw on a data
set of 50, 000 venues and over 15, 000, 000 check-ins that we
streamed and collected within a two month period in NYC.
Recently, people’s stated interests in social media data
from Facebook has been found to correlate with obesity
at fine spatial resolutions [7], whether in cities across the
United States or in neighborhoods in NYC. Stated inter-
ests, however, are different from actual social activity. That
is why we focus on social capital data from an LBSN. With
a similar approach as ours, linguistic analysis of geocoded
Twitter activity showed certain food words are associated
with several public health measures at the county level [9].
This was extended to consider both food and activities, and
associations between the caloric balance and public health
measures were shown at the state level [3].
Note that Foursquare has become a common data source
for measuring culture and social dynamics, e.g. to cluster
geographic regions into natural neighborhoods [8, 10, 22].
Notably, eating habits [19] and fast food restaurant loca-
tions [14] have been used.
As detailed below, we collected data for each venue reg-
istered on Foursquare across NYC over a two-month pe-
riod, May 2014 to July 2014. Information that we aggre-
gated and analyzed at the neighborhood level include venue
name, category, number of check-ins, address, and coordi-
nates. Broadly, venue information indicates both social ac-
tivity (actual check-ins) and social opportunity (presence of
venues). We screen for both kinds indicators and find that
opportunity is a better predictor for obesity rates in NYC
than activity. That is to say, in analogy to food deserts that
have previously been indicated in obesity [11], there may be
social capital deserts. Some possible limitations and biases
in our study are also noted.
2. DATA COLLECTION AND PREPROCESS-
ING
Foursquare is an LBSN that enables users to share their
current location with friends, rate and review venues they
visit, and read reviews posted by other users. Foursquare
Table 1: Foursquare Venue Categories
Arts & Entertainment
College & University
Event
Food
Nightlife Spot
Outdoors & Recreation
Professional & Other Places
Residence
Shop & Services
Travel & Transportation
Table 2: Putnam’s Social Capital Metrics
Club meetings attended last year
Community projects worked on last year
Times entertained at home last year
Times volunteered last year
“I spend a lot of time visiting friends”
“Most people are honest”
Served on committee for local organization
Served as officer of club or organization
Attended meeting on town or school affairs
501(c)(3) organizations per capita
Mean number of group memberships
“Most people can be trusted”
Civic and social organizations per capita
Mean presidential election turnout
users share their location information by checking-in via
their mobile devices. Each check-in is associated with a web
page that contains information about the user, the venue,
and other details of the visit. Each venue is also associ-
ated with a web page that indicates its large category, sub-
category within it, etc., as well as aggregates information
from user check-ins. The high-level categories of venues are
listed in Table 1. Foursquare was first launched in NYC in
2009 and had more than 45 million registered users at the
time of our two-month data collection period, May 2014 to
July 2014. The Foursquare API provides methods to access
various data and metadata, such as searching for venues,
listing categories, etc.
Note, that if one compares Foursquare categories and their
sub-categories to Putnam’s measures of social capital (which
are largely opportunistic and not necessarily principled), Ta-
ble 2, one sees several concordances. As such, Foursquare
potentially offers a good opportunity for measuring social
capital.
2.1 Foursquare Data
Using the Foursquare API, we collected streaming data
on 50,000 venues and over 15,000,000 check-ins within NYC.
Since venue web pages only show the current state of number
Figure 1: Histogram of number of check-ins at venues.
of check-ins, and since venues were established at different
times yielding differing period lengths over which to accu-
mulate check-ins, data was collected in two separate time
points separated by two months and subtracted. Thus we
have data on the number of check-ins in the two-month col-
lection period. Due to the limited number of queries per
hour restricted by Foursquare, all venue data cannot be
obtained simultaneously. Nevertheless, the total duration
needed to collect data across the whole city took no more
than 24 hours at each time point. Staged collection should
have little impact in this cross-sectional study.
The Foursquare API searching endpoint provides only two
kinds of geometrical shapes to query: circular and rectangu-
lar, but city neighborhoods are not those shapes. So as to
have the least number of duplicate venues and the highest
efficiency, we chose rectangular boxes for venue searching
to retrieve the venues data by placing numbers of different
sizes of searching boxes over the city. Nevertheless, the raw
data collected from the Foursquare API contained dupli-
cate venues, which would cause bias in analysis. By using a
venue ID hashing table, repeated venues streamed from the
Foursquare API were eliminated.
As we will see, we use neighborhood-level obesity rate data
from the New York City Mental Health and Hygiene De-
partment, collected using the United Hospital Fund (UHF)
neighborhood index, which divides NYC into 34 neighbor-
hoods based largely on postal code regions. As such, the
venue data also needs to be geospatially registered into the
34 neighborhoods. Using the postal codes provided for most
of the venues, venues information is organized into their
corresponding neighborhoods. For venues without a listed
postal code, their latitude and longitude is used to locate
their prospective neighborhoods by essentially creating con-
vex hulls from the coordinates of known venues.
Note that there is a non-uniform distribution of check-ins
by venue, Figure 1. Popular venues, mostly located in Man-
hattan, have 2000 check-ins in two months, whereas most
venues have less than 300 check-ins.
Foursquare provides its own categorization for each regis-
tered venue. Each of the ten general categories listed in Ta-
ble 1 are broken down into more detailed sub-categories. For
example, under category “Food,” there are “Asian Restau-
rant,”“BBQ Joint,”, “Buffet,” and more listings. To perform
a correlative study with obesity rate, we use these default
categories for venues. Some venues belong to several cate-
gory groups, but to avoid statistical difficulties we only take
the category type that appears in the first position under
each venue information; this is the primary classification.
Venue data is sorted into a two-by-two matrix with columns
corresponding to the 34 UHF neighborhoods and rows cor-
responding to venues. This matrix will be used to perform
regression modeling and further analysis.
2.2 Obesity Data
Obesity rate data across NYC is obtained from the New
York City Department of Health and Mental Hygiene.2
Us-
ing their online interactive health data system, EpiQuery,
obesity rates across the neighborhoods in NYC were ex-
tracted. As shown in Figure 2, NYC has a non-uniform
distribution of obesity rates and number of check-ins. How-
ever, because of the location and higher number of venues in
Manhattan, there are higher number of check-ins with lower
number of obesity rates, the so-called “Manhattan Effect”
we will discuss below.
Wealth and income may have connection to Foursquare
usage and to venue locations. Hence we also obtain this data
at the neighborhood level from the American Community
Survey of the United State Census.
3. ANALYSIS
To assess the association between Foursquare-derived in-
dicators and obesity rates, we performed a basic assessment
by determining multivariate linear regression models over
the 34 neighborhoods in NYC. As we had discussed be-
fore, we are interested in notions of opportunity (number of
venues) and activity (number of total check-ins at venues) in
neighborhoods. As such, venue data is separated into a set of
opportunity features and a set of activity features, each for
all of the 10 high-level categories depicted in Table 1. Two
multivariate linear regression models were trained. Table 3
shows the R2
values that are obtained.
Because of the disparities in wealth among neighborhoods,
one might wonder whether the regression result is largely due
to such covariates. The table also indicates R2
values upon
controlling for neighborhood median income, and for neigh-
borhood median net worth. We conclude that both social
capital activity and opportunity explain much of the varia-
tion in obesity rates, even when controlling for neighborhood
financial capital demographics. Without controlling for fi-
nancial capital, social capital opportunity in a given neigh-
borhood is associated with obesity rate more strongly than
social capital activity, and so going forward we will focus
on opportunity rather than activity. Besides greater coef-
ficient of multiple determination, opportunity is also more
amenable to policy actions than action.
Moving beyond the high-level categories, we also consider
the 664 lower-level categories of opportunity and perform
a multiple testing procedure on each independently [4]. As
is standard, we do not consider the nature of collinearity
among these features. Figure 3 shows a scatter plot of the
uncorrected p-values of the linear regressions, as a function
of the r2
values, for both activity and opportunity. As can
be noted, the p-values are exceedingly small, so they are
2
http://www.nyc.gov/html/doh/html/home/home.shtml
(a) (b)
Figure 2: Manhattan has the highest number of check-ins and the lowest obesity rates comparing to other boroughs, the
“Manhattan Effect”. (a) Obesity rate distribution in NYC, where lighter colors correspond to lower obesity rates and darker
colors correspond to higher obesity rates. (b) Number of check-ins distribution in New York City, where lighter colors
correspond to higher number of check-ins, and darker colors correspond to lower number of check-ins.
Table 3: Linear Model Association between Obesity Rates
and High-Level Foursquare Category Features
coefficient of determination venues check-ins
R2
0.7357 0.6984
R2
(controlling income) 0.5819 0.6907
R2
(controlling net worth) 0.4791 0.5540
statistically significant even if multiplied 1000-fold, as one
might in a very conservative application of the Bonferroni
correction.
Table 4 lists the statistic for both activity and opportu-
nity. As illustrated, although number of categories with r2
above 0.2 for opportunities data is smaller comparing to
number of check-ins, opportunity performs better under re-
gression model for predicting obesity rates in neighborhoods.
Figure 4 shows subcategories of opportunity that are nega-
tively correlated with obesity rates, whereas Figure 5 shows
subcategories of opportunity that are positively correlated
with obesity rates.
4. CONCLUSION
The environment has a strong effect on health outcomes,
and although the built environment can be assessed at fine
spatiotemporal scales using remote sensing [2], this has been
difficult for the social environment. Here we proposed using
data from an LBSN to do a kind of social remote sensing,
that logs data immediately and is therefore not subject to
the cognitive biases of retrospective surveys.
Our exploratory research indicates that social capital in-
dicators from a LBSN like Foursquare are indeed associated
with public health measures of obesity rate in New York
City. Moreover that social capital opportunity, in terms of
number of venues, is more predictive than social capital ac-
tivity, in terms of number of check-ins. We can interpret this
in the sense of “social capital deserts” in analogy to “food
deserts” that have previously been associated with obesity:
lack of social capital building opportunity would increase
the risk of high obesity rates.
Some points to note in interpreting results, and possibly
explaining why opportunity is more indicative than activity:
• Sampling bias in check-ins, where Foursquare users are
limited to certain demographic groups, e.g. people with
smart phones.
• “Manhattan Effect”, where Manhattan has the most
venues and check-ins. Since the general obesity rate in
Manhattan is lower than other boroughs, it may bias
regression results.
• Seasonality, since check-in data was collected in the
summer.
In future work, we intend to address these statistical is-
sues that may be biasing our results, drawing on advanced
methodologies in data science. For example, mitigating the
“Manhattan Effect” is at the top of our agenda. Controlling
for population demographics will help with sampling bias
of Foursquare users. By having longer term data collection
instead of restricting the collection period to two months,
any possible seasonality effects could be assuaged. Another
data limitation that remains to be addressed is to include
the effect that where people live (and report public health
measures) differs from where they conduct social activities;
perhaps population flow statistics from taxi cabs and sub-
ways can be used.
The same basic data and association methodology can also
be applied to other chronic diseases such as diabetes. Link-
ing people’s interests (as measurable via Facebook [7]) with
their social opportunities may also allow a better under-
standing of what leads to their activities, and give a broader
sense of social environment in public health.
(a) Venue
(b) Check-ins
Figure 3: Scatter plots with r2
and uncorrected p-value for
number of venues and number of check-ins.
Table 4: Check-ins and Venue Statistic
(a) Check-ins Data
Number of categories
with r2
> 0.2
131
Attribute r2
uncorrected p-value
Mean 0.3197 9.696 × 10−13
Median 0.3140 1.837 × 10−18
Variance 0.006889 1.221 × 10−22
(b) Venue Data
Number of categories
with r2
> 0.2
59
Attribute r2
uncorrected p-value
Mean 0.2792 9.173 × 10−12
Median 0.2499 1.108 × 10−17
Variance 0.005265 2.784 × 10−21
Our results are given at the level of neighborhoods, but
perhaps predictions can be dissagregated even further than
neighborhoods to the level of postal code regions or even
city blocks. Finding the point of granularity when aggregate
behavior becomes individual behavior is of general interest
for public health surveillance. Indeed, the overarching pub-
lic health goal of our data-driven study is to provide finer
spatial granularity of chronic disease, so that social capital
interventions can be prioritized and targeted to make best
use of limited resources.
5. REFERENCES
[1] Obesity: Preventing and managing the global
epidemic. Technical Report 894, World Health
Organization, 2002.
[2] B. Abelson, K. R. Varshney, and J. Sun. Targeting
direct cash transfers to the extremely poor. In Proc.
20th ACM SIGKDD Int. Conf. Knowl. Discov. Data
Min. (KDD ’14), pages 1563–1572, Aug. 2014.
[3] S. E. Alajajian, J. R. Williams, A. J. Reagan, S. C.
Alajajian, M. R. Frank, L. Mitchell, J. Lahne, C. M.
Danforth, and P. S. Dodds. The lexicocalorimeter:
Gauging public health through caloric input and
output on social media. arXiv:1507.05098v1
[physics.soc-ph]., July 2015.
[4] H. Bannerman-Thompson, M. B. Rao, and
R. Chakraborty. Multiple testing of hypotheses in
biomedical research. In C. R. Rao, R. Chakraborty,
and P. K. Sen, editors, Bioinformatics in Human
Health and Heredity, pages 201–238. Elsevier, 2012.
[5] N. Cameron, N. G. Norgan, and G. T. H. Ellison,
editors. Childhood Obesity: Contemporary Issues. CRC
Press, Boca Raton, FL, 2006.
[6] N. A. Christakis and J. H. Fowler. The spread of
obesity in a large social network over 32 years. New
Engl. J. Med., 357(4):370–379, July 2007.
[7] R. Chunara, L. Bouton, J. W. Ayers, and J. S.
Brownstein. Assessing the online social environment
for surveillance of obesity prevalence. PLoS ONE,
8(4):e61373, 2013.
[8] J. Cranshaw, R. Schwartz, J. Hong, and N. Sadeh.
The livehoods project: Utilizing social media to
understand the dynamics of a city. In Proc. 6th Int.
AAAI Conf. Weblogs and Social Media (ICWSM),
June 2012.
[9] A. Culotta. Estimating county health statistics with
Twitter. In Proc. SIGCHI Conf. Hum. Factors
Comput. Syst. (CHI 2014), pages 1335–1344, Apr.
2014.
[10] G. Le Falher, A. Gionis, and M. Mathioudakis. Where
is the Soho of Rome?: Measures and algorithms for
finding similar neighborhoods in cities. In Proc. 9th
Int. AAAI Conf. Weblogs and Social Media (ICWSM),
pages 228–237, May 2015.
[11] M. Harding and M. Lovenheim. The effect of prices on
nutrition: Comparing the impact of product- and
nutrient-specific taxes. Working Paper 19781, NBER,
Jan. 2014.
[12] D. R. Holtgrave and R. Crosby. Is social capital a
protective factor against obesity and diabetes?
findings from an exploratory study. Ann. Epidemiol.,
16(5):406–408, May 2006.
[13] D. Kim, S. V. Subramanian, S. L. Gortmaker, and
I. Kawachi. US state- and county-level social capital in
relation to obesity and physical inactivity: A
multilevel, multivariable analysis. Soc. Sci. Med.,
63(4):1045–1059, Aug. 2006.
[14] Y. Mejova, H. Haddadi, A. Noulas, and I. Weber.
#FoodPorn: Obesity patterns in culinary interactions.
In Proc. 5th Int. Conf. Digital Health 2015 (DH ’15),
pages 51–58, May 2015.
[15] S. Moore, M. Daniel, C. Paquet, L. Dub´e, and
L. Gauvin. Association of individual network social
capital with abdominal adiposity, overweight and
obesity. J. Public Health, 31(1):175–183, Mar. 2009.
[16] J. M. Muckenhuber, T. E. Dorner, N. Burkert,
F. Grosch¨adl, and W. Freidl. Low social capital as a
predictor for the risk of obesity. Health Social Work,
40(2):e51–e58, May 2015.
[17] R. W. Pearson, M. Ross, and R. M. Dawes. Personal
recall and the limits of retrospective questions in
surveys. In J. M. Tanur, editor, Questions about
Questions: Inquiries into the Cognitive Bases of
Surveys, pages 65–94. Russel Sage Foundation, 1992.
[18] R. D. Putnam. Bowling Alone: The Collapse and
Revival of American Community. Simon & Schuster,
New York, 2000.
[19] T. H. Silva, P. O. S. Vaz de Melo, J. M. Almeida,
M. Musolesi, and A. A. F. Loureiro. You are what you
eat (and drink): Identifying cultural boundaries by
analyzing food and drink habits in Foursquare. In
Proc. 8th Int. AAAI Conf. Weblogs and Social Media
(ICWSM), pages 466–475, June 2014.
[20] L. R. Varshney. Fundamental limits of data analytics
in sociotechnical systems. 2015. submitted.
[21] J. Yoon and T. T. Brown. Does the promotion of
community social capital reduce obesity risk? J.
Socio-Econ., 40(3):296–305, May 2011.
[22] A. X. Zhang, A. Noulas, S. Scellato, and C. Mascolo.
Hoodsquare: Modeling and recommending
neighborhoods in location-based social networks. In
Proc. 2013 Int. Conf. Social Comput. (SocialCom),
pages 69–74, Sept. 2013.
(a) Arts & Crafts Store (b) Bookstore (c) Design Studio
(d) French Restaurant (e) Wine Bar (f) Wine Shop
Figure 4: Scatter plots with linear regression for some subcategories that are negatively correlated with obesity rates.
(a) Automotive Shop (b) Fast Food Restaurant (c) Fried Chicken Joint
(d) Gas Station Garage (e) Housing Development (f) Subway
Figure 5: Scatter plots with linear regression for some subcategories that are positively correlated with obesity rates.

More Related Content

Similar to obesity surveillance

Public Health Spotlight June 2015
Public Health Spotlight June 2015Public Health Spotlight June 2015
Public Health Spotlight June 2015
Phillip Brennan
 
Jennifer Lee, BCBS MA Foundation
Jennifer Lee, BCBS MA FoundationJennifer Lee, BCBS MA Foundation
Jennifer Lee, BCBS MA Foundation
Mad*Pow
 
server05productnCCPP5-4CPP403.txt unknown Seq 1 13-OCT.docx
server05productnCCPP5-4CPP403.txt unknown Seq 1 13-OCT.docxserver05productnCCPP5-4CPP403.txt unknown Seq 1 13-OCT.docx
server05productnCCPP5-4CPP403.txt unknown Seq 1 13-OCT.docx
lesleyryder69361
 
RESEARCH ARTICLEPerceived discrimination in medical settin.docx
RESEARCH ARTICLEPerceived discrimination in medical settin.docxRESEARCH ARTICLEPerceived discrimination in medical settin.docx
RESEARCH ARTICLEPerceived discrimination in medical settin.docx
rgladys1
 
mph 609 Week 8 assignment
mph 609 Week 8 assignmentmph 609 Week 8 assignment
mph 609 Week 8 assignment
Steven Banjoff
 
ajph.2014.302505
ajph.2014.302505ajph.2014.302505
ajph.2014.302505
Jerry Jones
 
The department of health in taiwan initiated community health development
The department of health in taiwan initiated community health developmentThe department of health in taiwan initiated community health development
The department of health in taiwan initiated community health development
Maricris Santos
 
Secondary Data Table Template The data obtained on this table is.docx
Secondary Data Table Template The data obtained on this table is.docxSecondary Data Table Template The data obtained on this table is.docx
Secondary Data Table Template The data obtained on this table is.docx
rtodd280
 
Task Force Project—Applying TheoryIn Module 1, you began.docx
Task Force Project—Applying TheoryIn Module 1, you began.docxTask Force Project—Applying TheoryIn Module 1, you began.docx
Task Force Project—Applying TheoryIn Module 1, you began.docx
briankimberly26463
 

Similar to obesity surveillance (20)

Public Health Spotlight June 2015
Public Health Spotlight June 2015Public Health Spotlight June 2015
Public Health Spotlight June 2015
 
Jennifer Lee, BCBS MA Foundation
Jennifer Lee, BCBS MA FoundationJennifer Lee, BCBS MA Foundation
Jennifer Lee, BCBS MA Foundation
 
Social Determinant of Health
Social Determinant of HealthSocial Determinant of Health
Social Determinant of Health
 
Teagen Johnson: CHNA Dane County, WI: Creighton MPH602
Teagen Johnson: CHNA Dane County, WI: Creighton MPH602Teagen Johnson: CHNA Dane County, WI: Creighton MPH602
Teagen Johnson: CHNA Dane County, WI: Creighton MPH602
 
server05productnCCPP5-4CPP403.txt unknown Seq 1 13-OCT.docx
server05productnCCPP5-4CPP403.txt unknown Seq 1 13-OCT.docxserver05productnCCPP5-4CPP403.txt unknown Seq 1 13-OCT.docx
server05productnCCPP5-4CPP403.txt unknown Seq 1 13-OCT.docx
 
RESEARCH ARTICLEPerceived discrimination in medical settin.docx
RESEARCH ARTICLEPerceived discrimination in medical settin.docxRESEARCH ARTICLEPerceived discrimination in medical settin.docx
RESEARCH ARTICLEPerceived discrimination in medical settin.docx
 
mph 609 Week 8 assignment
mph 609 Week 8 assignmentmph 609 Week 8 assignment
mph 609 Week 8 assignment
 
Colloque RI 2014 : Intervention de Ross C. BROWNSON (Washington University in...
Colloque RI 2014 : Intervention de Ross C. BROWNSON (Washington University in...Colloque RI 2014 : Intervention de Ross C. BROWNSON (Washington University in...
Colloque RI 2014 : Intervention de Ross C. BROWNSON (Washington University in...
 
ajph.2014.302505
ajph.2014.302505ajph.2014.302505
ajph.2014.302505
 
Integrating Health, Livable Communities and Transit: A How-To Discussion by E...
Integrating Health, Livable Communities and Transit: A How-To Discussion by E...Integrating Health, Livable Communities and Transit: A How-To Discussion by E...
Integrating Health, Livable Communities and Transit: A How-To Discussion by E...
 
A retrospective review of the Honduras AIN-C program guided by a community he...
A retrospective review of the Honduras AIN-C program guided by a community he...A retrospective review of the Honduras AIN-C program guided by a community he...
A retrospective review of the Honduras AIN-C program guided by a community he...
 
The department of health in taiwan initiated community health development
The department of health in taiwan initiated community health developmentThe department of health in taiwan initiated community health development
The department of health in taiwan initiated community health development
 
Building Capacity
Building CapacityBuilding Capacity
Building Capacity
 
APLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating NetworkAPLIC 2014 - Social Observatories Coordinating Network
APLIC 2014 - Social Observatories Coordinating Network
 
Secondary Data Table Template The data obtained on this table is.docx
Secondary Data Table Template The data obtained on this table is.docxSecondary Data Table Template The data obtained on this table is.docx
Secondary Data Table Template The data obtained on this table is.docx
 
Substance Abuse Osceola, Michigan
Substance Abuse Osceola, MichiganSubstance Abuse Osceola, Michigan
Substance Abuse Osceola, Michigan
 
Substance Abuse Lake, Michigan
Substance Abuse Lake, MichiganSubstance Abuse Lake, Michigan
Substance Abuse Lake, Michigan
 
CIW AOHC - 2015 CACHC Conference Presentation
CIW AOHC - 2015 CACHC Conference PresentationCIW AOHC - 2015 CACHC Conference Presentation
CIW AOHC - 2015 CACHC Conference Presentation
 
Task Force Project—Applying TheoryIn Module 1, you began.docx
Task Force Project—Applying TheoryIn Module 1, you began.docxTask Force Project—Applying TheoryIn Module 1, you began.docx
Task Force Project—Applying TheoryIn Module 1, you began.docx
 
Civil Society Engagement Practical Country Platform Solutions to Reach Every ...
Civil Society Engagement Practical Country Platform Solutions to Reach Every ...Civil Society Engagement Practical Country Platform Solutions to Reach Every ...
Civil Society Engagement Practical Country Platform Solutions to Reach Every ...
 

obesity surveillance

  • 1. Social Capital Deserts: Obesity Surveillance using a Location-Based Social Network Hongyang Bai University of Illinois at Urbana-Champaign hbai4@illinois.edu Rumi Chunara New York University rumi.chunara@nyu.edu Lav R. Varshney University of Illinois at Urbana-Champaign varshney@illinois.edu ABSTRACT There is emerging evidence showing that strong community structures, measurable via social capital metrics, are bene- ficial for health outcomes. Here we demonstrate an associ- ation between obesity rates in neighborhoods with check-in and venue data from a location-based social network. We first describe the possibility of using social media data to dierctly measure social capital, and then describe how to connect public health outcomes such as obesity to these new social capital measures. By using the Foursquare API, venue information including name, address, coordinates, number of check-ins, etc. are collected for New York City: informa- tion on 50,000 venues and over 15,000,000 check-ins were streamed within two months. The collected data is general- ized and categorized into a two-by-two matrix according to location and category type; venue categories are created in a hierarchical fashion based on ones provided by Foursquare, e.g. “food,” “event,” “residence,” “hotel,” etc. We show the number of venues in neighborhoods of several categories are correlated with obesity rate, either positively or negatively. In particular, there are social capital deserts linked with greater obesity rates. Broadly, this work finds that location- based social networks can be used for surveillance of social capital, which is associated with prevalence of chronic dis- eases of energy imbalance. Results can be used to improve public health by targeting social capital interventions at the fine grain measurable through social media. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous Keywords intervention prioritization, location-based social media, neigh- borhoods, New York City, obesity, public health surveil- lance, social capital, social good 1. INTRODUCTION Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Data for Good Exchange 2015 New York, New York USA Copyright 2015 ACM X-XXXXX-XX-X/XX/XX ...$15.00. To directly measure aspects of social environments that may influence health outcomes, without recall bias and at spatiotemporally fine grain, we propose to perform public health surveillance through ubiquitous data generated by a location-based social network (LBSN) such as Foursquare.1 Fine-grained health surveillance may allow organizations to raise the global quality of life by pinpointing where inter- ventions are most needed. Indeed, prioritization using data analytics is crucial whenever there are limited resources [20]. We focus on chronic diseases like obesity that have several deleterious health effects [5]; obesity is a known risk factor for conditions including diabetes, high blood pressure, heart disease, and sleep apnea. The recent rapid rise in obesity prevalence indicates not only biological factors but also environmental factors. It is generally understood to be caused by behavioral choices re- garding nutrition and exercise; rises in obesity are explained by changes in availability of foods and the emergence of pre- dominantly sedentary lifestyles [1]. Since the known envi- ronmental risk factors are modifiable, policies interventions have focused on modifying obesogenic environments where high-calorie foods are over-abundant, time is restricted, and exercise is optional rather than obligatory. Approaches in- clude price-based interventions, such as a sugar-sweetened beverage tax, and place-based policies to increase the avail- ability of higher-quality foods in lower-income areas. Em- pirically, it is found that nutritional bundles are largely un- affected by the local purchasing environment, but taxation can have a strong effect [11]. Beyond the built environment, social environment has also been indicated in health outcomes such as obesity, seen for example through the existence of obesity contagions [6]. In particular, discussions of social capital focus on how so- cial relations have productive benefits, whether for economic prosperity, happiness, or health. Putnam suggests that “As a rough rule of thumb, if you belong to no groups but decide to join one, you cut your risk of dying over the next year in half” [18]. Previous public health and epidemiology studies have demon- strated associations between social capital and obesity preva- lence. State-level correlational analysis demonstrates social capital (operationalized by Putnam’s measure, Table 2) has significant bivariate relationship with rates of obesity and diabetes, as reported by the Centers for Disease Control and Prevention Behavioral Risk Factor Surveillance System (BRFSS), explaining 10% of the variance in obesity and 44% of the variance in diabetes [12]. State- and county-level 1 https://foursquare.com/
  • 2. correlational analysis also found significant relationships be- tween social capital measures and prevalence of obesity as well as physical inactivity [13]. A similar state-level study reported social capital affects obesity through the promotion of weight-control efforts [21]. Pushing the relevance of social capital from the collective to the individual, associations have also been found relating individual trust, participation, and social capital with waist circumference and body mass index [15], even when control- ling for socioeconomic status and lifestyle factors [16]. Since there may be associations between social capital metrics and chronic disease prevalence, surveillance of one may provide surveillance of the other. Social surveys such as BRFSS and the EpiQuery Com- munity Health Survey in New York City (NYC) measure public health, but have limited granularity and frequency due to the expense of data collection. Furthermore, survey- based methodologies inherently suffer from recall bias and other biases. Indeed, biases in retrospective surveys have been a longstanding limitation to measuring social environ- ments [17]. To finely measure social environments directly without cognitive bias, we use data that is immediately geocoded and logged by Foursquare. In particular, we draw on a data set of 50, 000 venues and over 15, 000, 000 check-ins that we streamed and collected within a two month period in NYC. Recently, people’s stated interests in social media data from Facebook has been found to correlate with obesity at fine spatial resolutions [7], whether in cities across the United States or in neighborhoods in NYC. Stated inter- ests, however, are different from actual social activity. That is why we focus on social capital data from an LBSN. With a similar approach as ours, linguistic analysis of geocoded Twitter activity showed certain food words are associated with several public health measures at the county level [9]. This was extended to consider both food and activities, and associations between the caloric balance and public health measures were shown at the state level [3]. Note that Foursquare has become a common data source for measuring culture and social dynamics, e.g. to cluster geographic regions into natural neighborhoods [8, 10, 22]. Notably, eating habits [19] and fast food restaurant loca- tions [14] have been used. As detailed below, we collected data for each venue reg- istered on Foursquare across NYC over a two-month pe- riod, May 2014 to July 2014. Information that we aggre- gated and analyzed at the neighborhood level include venue name, category, number of check-ins, address, and coordi- nates. Broadly, venue information indicates both social ac- tivity (actual check-ins) and social opportunity (presence of venues). We screen for both kinds indicators and find that opportunity is a better predictor for obesity rates in NYC than activity. That is to say, in analogy to food deserts that have previously been indicated in obesity [11], there may be social capital deserts. Some possible limitations and biases in our study are also noted. 2. DATA COLLECTION AND PREPROCESS- ING Foursquare is an LBSN that enables users to share their current location with friends, rate and review venues they visit, and read reviews posted by other users. Foursquare Table 1: Foursquare Venue Categories Arts & Entertainment College & University Event Food Nightlife Spot Outdoors & Recreation Professional & Other Places Residence Shop & Services Travel & Transportation Table 2: Putnam’s Social Capital Metrics Club meetings attended last year Community projects worked on last year Times entertained at home last year Times volunteered last year “I spend a lot of time visiting friends” “Most people are honest” Served on committee for local organization Served as officer of club or organization Attended meeting on town or school affairs 501(c)(3) organizations per capita Mean number of group memberships “Most people can be trusted” Civic and social organizations per capita Mean presidential election turnout users share their location information by checking-in via their mobile devices. Each check-in is associated with a web page that contains information about the user, the venue, and other details of the visit. Each venue is also associ- ated with a web page that indicates its large category, sub- category within it, etc., as well as aggregates information from user check-ins. The high-level categories of venues are listed in Table 1. Foursquare was first launched in NYC in 2009 and had more than 45 million registered users at the time of our two-month data collection period, May 2014 to July 2014. The Foursquare API provides methods to access various data and metadata, such as searching for venues, listing categories, etc. Note, that if one compares Foursquare categories and their sub-categories to Putnam’s measures of social capital (which are largely opportunistic and not necessarily principled), Ta- ble 2, one sees several concordances. As such, Foursquare potentially offers a good opportunity for measuring social capital. 2.1 Foursquare Data Using the Foursquare API, we collected streaming data on 50,000 venues and over 15,000,000 check-ins within NYC. Since venue web pages only show the current state of number
  • 3. Figure 1: Histogram of number of check-ins at venues. of check-ins, and since venues were established at different times yielding differing period lengths over which to accu- mulate check-ins, data was collected in two separate time points separated by two months and subtracted. Thus we have data on the number of check-ins in the two-month col- lection period. Due to the limited number of queries per hour restricted by Foursquare, all venue data cannot be obtained simultaneously. Nevertheless, the total duration needed to collect data across the whole city took no more than 24 hours at each time point. Staged collection should have little impact in this cross-sectional study. The Foursquare API searching endpoint provides only two kinds of geometrical shapes to query: circular and rectangu- lar, but city neighborhoods are not those shapes. So as to have the least number of duplicate venues and the highest efficiency, we chose rectangular boxes for venue searching to retrieve the venues data by placing numbers of different sizes of searching boxes over the city. Nevertheless, the raw data collected from the Foursquare API contained dupli- cate venues, which would cause bias in analysis. By using a venue ID hashing table, repeated venues streamed from the Foursquare API were eliminated. As we will see, we use neighborhood-level obesity rate data from the New York City Mental Health and Hygiene De- partment, collected using the United Hospital Fund (UHF) neighborhood index, which divides NYC into 34 neighbor- hoods based largely on postal code regions. As such, the venue data also needs to be geospatially registered into the 34 neighborhoods. Using the postal codes provided for most of the venues, venues information is organized into their corresponding neighborhoods. For venues without a listed postal code, their latitude and longitude is used to locate their prospective neighborhoods by essentially creating con- vex hulls from the coordinates of known venues. Note that there is a non-uniform distribution of check-ins by venue, Figure 1. Popular venues, mostly located in Man- hattan, have 2000 check-ins in two months, whereas most venues have less than 300 check-ins. Foursquare provides its own categorization for each regis- tered venue. Each of the ten general categories listed in Ta- ble 1 are broken down into more detailed sub-categories. For example, under category “Food,” there are “Asian Restau- rant,”“BBQ Joint,”, “Buffet,” and more listings. To perform a correlative study with obesity rate, we use these default categories for venues. Some venues belong to several cate- gory groups, but to avoid statistical difficulties we only take the category type that appears in the first position under each venue information; this is the primary classification. Venue data is sorted into a two-by-two matrix with columns corresponding to the 34 UHF neighborhoods and rows cor- responding to venues. This matrix will be used to perform regression modeling and further analysis. 2.2 Obesity Data Obesity rate data across NYC is obtained from the New York City Department of Health and Mental Hygiene.2 Us- ing their online interactive health data system, EpiQuery, obesity rates across the neighborhoods in NYC were ex- tracted. As shown in Figure 2, NYC has a non-uniform distribution of obesity rates and number of check-ins. How- ever, because of the location and higher number of venues in Manhattan, there are higher number of check-ins with lower number of obesity rates, the so-called “Manhattan Effect” we will discuss below. Wealth and income may have connection to Foursquare usage and to venue locations. Hence we also obtain this data at the neighborhood level from the American Community Survey of the United State Census. 3. ANALYSIS To assess the association between Foursquare-derived in- dicators and obesity rates, we performed a basic assessment by determining multivariate linear regression models over the 34 neighborhoods in NYC. As we had discussed be- fore, we are interested in notions of opportunity (number of venues) and activity (number of total check-ins at venues) in neighborhoods. As such, venue data is separated into a set of opportunity features and a set of activity features, each for all of the 10 high-level categories depicted in Table 1. Two multivariate linear regression models were trained. Table 3 shows the R2 values that are obtained. Because of the disparities in wealth among neighborhoods, one might wonder whether the regression result is largely due to such covariates. The table also indicates R2 values upon controlling for neighborhood median income, and for neigh- borhood median net worth. We conclude that both social capital activity and opportunity explain much of the varia- tion in obesity rates, even when controlling for neighborhood financial capital demographics. Without controlling for fi- nancial capital, social capital opportunity in a given neigh- borhood is associated with obesity rate more strongly than social capital activity, and so going forward we will focus on opportunity rather than activity. Besides greater coef- ficient of multiple determination, opportunity is also more amenable to policy actions than action. Moving beyond the high-level categories, we also consider the 664 lower-level categories of opportunity and perform a multiple testing procedure on each independently [4]. As is standard, we do not consider the nature of collinearity among these features. Figure 3 shows a scatter plot of the uncorrected p-values of the linear regressions, as a function of the r2 values, for both activity and opportunity. As can be noted, the p-values are exceedingly small, so they are 2 http://www.nyc.gov/html/doh/html/home/home.shtml
  • 4. (a) (b) Figure 2: Manhattan has the highest number of check-ins and the lowest obesity rates comparing to other boroughs, the “Manhattan Effect”. (a) Obesity rate distribution in NYC, where lighter colors correspond to lower obesity rates and darker colors correspond to higher obesity rates. (b) Number of check-ins distribution in New York City, where lighter colors correspond to higher number of check-ins, and darker colors correspond to lower number of check-ins. Table 3: Linear Model Association between Obesity Rates and High-Level Foursquare Category Features coefficient of determination venues check-ins R2 0.7357 0.6984 R2 (controlling income) 0.5819 0.6907 R2 (controlling net worth) 0.4791 0.5540 statistically significant even if multiplied 1000-fold, as one might in a very conservative application of the Bonferroni correction. Table 4 lists the statistic for both activity and opportu- nity. As illustrated, although number of categories with r2 above 0.2 for opportunities data is smaller comparing to number of check-ins, opportunity performs better under re- gression model for predicting obesity rates in neighborhoods. Figure 4 shows subcategories of opportunity that are nega- tively correlated with obesity rates, whereas Figure 5 shows subcategories of opportunity that are positively correlated with obesity rates. 4. CONCLUSION The environment has a strong effect on health outcomes, and although the built environment can be assessed at fine spatiotemporal scales using remote sensing [2], this has been difficult for the social environment. Here we proposed using data from an LBSN to do a kind of social remote sensing, that logs data immediately and is therefore not subject to the cognitive biases of retrospective surveys. Our exploratory research indicates that social capital in- dicators from a LBSN like Foursquare are indeed associated with public health measures of obesity rate in New York City. Moreover that social capital opportunity, in terms of number of venues, is more predictive than social capital ac- tivity, in terms of number of check-ins. We can interpret this in the sense of “social capital deserts” in analogy to “food deserts” that have previously been associated with obesity: lack of social capital building opportunity would increase the risk of high obesity rates. Some points to note in interpreting results, and possibly explaining why opportunity is more indicative than activity: • Sampling bias in check-ins, where Foursquare users are limited to certain demographic groups, e.g. people with smart phones. • “Manhattan Effect”, where Manhattan has the most venues and check-ins. Since the general obesity rate in Manhattan is lower than other boroughs, it may bias regression results. • Seasonality, since check-in data was collected in the summer. In future work, we intend to address these statistical is- sues that may be biasing our results, drawing on advanced methodologies in data science. For example, mitigating the “Manhattan Effect” is at the top of our agenda. Controlling for population demographics will help with sampling bias of Foursquare users. By having longer term data collection instead of restricting the collection period to two months, any possible seasonality effects could be assuaged. Another data limitation that remains to be addressed is to include the effect that where people live (and report public health measures) differs from where they conduct social activities; perhaps population flow statistics from taxi cabs and sub- ways can be used. The same basic data and association methodology can also be applied to other chronic diseases such as diabetes. Link- ing people’s interests (as measurable via Facebook [7]) with their social opportunities may also allow a better under- standing of what leads to their activities, and give a broader sense of social environment in public health.
  • 5. (a) Venue (b) Check-ins Figure 3: Scatter plots with r2 and uncorrected p-value for number of venues and number of check-ins. Table 4: Check-ins and Venue Statistic (a) Check-ins Data Number of categories with r2 > 0.2 131 Attribute r2 uncorrected p-value Mean 0.3197 9.696 × 10−13 Median 0.3140 1.837 × 10−18 Variance 0.006889 1.221 × 10−22 (b) Venue Data Number of categories with r2 > 0.2 59 Attribute r2 uncorrected p-value Mean 0.2792 9.173 × 10−12 Median 0.2499 1.108 × 10−17 Variance 0.005265 2.784 × 10−21 Our results are given at the level of neighborhoods, but perhaps predictions can be dissagregated even further than neighborhoods to the level of postal code regions or even city blocks. Finding the point of granularity when aggregate behavior becomes individual behavior is of general interest for public health surveillance. Indeed, the overarching pub- lic health goal of our data-driven study is to provide finer spatial granularity of chronic disease, so that social capital interventions can be prioritized and targeted to make best use of limited resources. 5. REFERENCES [1] Obesity: Preventing and managing the global epidemic. Technical Report 894, World Health Organization, 2002. [2] B. Abelson, K. R. Varshney, and J. Sun. Targeting direct cash transfers to the extremely poor. In Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. (KDD ’14), pages 1563–1572, Aug. 2014. [3] S. E. Alajajian, J. R. Williams, A. J. Reagan, S. C. Alajajian, M. R. Frank, L. Mitchell, J. Lahne, C. M. Danforth, and P. S. Dodds. The lexicocalorimeter: Gauging public health through caloric input and output on social media. arXiv:1507.05098v1 [physics.soc-ph]., July 2015. [4] H. Bannerman-Thompson, M. B. Rao, and R. Chakraborty. Multiple testing of hypotheses in biomedical research. In C. R. Rao, R. Chakraborty, and P. K. Sen, editors, Bioinformatics in Human Health and Heredity, pages 201–238. Elsevier, 2012. [5] N. Cameron, N. G. Norgan, and G. T. H. Ellison, editors. Childhood Obesity: Contemporary Issues. CRC Press, Boca Raton, FL, 2006. [6] N. A. Christakis and J. H. Fowler. The spread of obesity in a large social network over 32 years. New Engl. J. Med., 357(4):370–379, July 2007. [7] R. Chunara, L. Bouton, J. W. Ayers, and J. S. Brownstein. Assessing the online social environment for surveillance of obesity prevalence. PLoS ONE, 8(4):e61373, 2013. [8] J. Cranshaw, R. Schwartz, J. Hong, and N. Sadeh. The livehoods project: Utilizing social media to understand the dynamics of a city. In Proc. 6th Int. AAAI Conf. Weblogs and Social Media (ICWSM), June 2012. [9] A. Culotta. Estimating county health statistics with Twitter. In Proc. SIGCHI Conf. Hum. Factors Comput. Syst. (CHI 2014), pages 1335–1344, Apr. 2014. [10] G. Le Falher, A. Gionis, and M. Mathioudakis. Where is the Soho of Rome?: Measures and algorithms for finding similar neighborhoods in cities. In Proc. 9th Int. AAAI Conf. Weblogs and Social Media (ICWSM), pages 228–237, May 2015. [11] M. Harding and M. Lovenheim. The effect of prices on nutrition: Comparing the impact of product- and nutrient-specific taxes. Working Paper 19781, NBER, Jan. 2014. [12] D. R. Holtgrave and R. Crosby. Is social capital a protective factor against obesity and diabetes? findings from an exploratory study. Ann. Epidemiol., 16(5):406–408, May 2006.
  • 6. [13] D. Kim, S. V. Subramanian, S. L. Gortmaker, and I. Kawachi. US state- and county-level social capital in relation to obesity and physical inactivity: A multilevel, multivariable analysis. Soc. Sci. Med., 63(4):1045–1059, Aug. 2006. [14] Y. Mejova, H. Haddadi, A. Noulas, and I. Weber. #FoodPorn: Obesity patterns in culinary interactions. In Proc. 5th Int. Conf. Digital Health 2015 (DH ’15), pages 51–58, May 2015. [15] S. Moore, M. Daniel, C. Paquet, L. Dub´e, and L. Gauvin. Association of individual network social capital with abdominal adiposity, overweight and obesity. J. Public Health, 31(1):175–183, Mar. 2009. [16] J. M. Muckenhuber, T. E. Dorner, N. Burkert, F. Grosch¨adl, and W. Freidl. Low social capital as a predictor for the risk of obesity. Health Social Work, 40(2):e51–e58, May 2015. [17] R. W. Pearson, M. Ross, and R. M. Dawes. Personal recall and the limits of retrospective questions in surveys. In J. M. Tanur, editor, Questions about Questions: Inquiries into the Cognitive Bases of Surveys, pages 65–94. Russel Sage Foundation, 1992. [18] R. D. Putnam. Bowling Alone: The Collapse and Revival of American Community. Simon & Schuster, New York, 2000. [19] T. H. Silva, P. O. S. Vaz de Melo, J. M. Almeida, M. Musolesi, and A. A. F. Loureiro. You are what you eat (and drink): Identifying cultural boundaries by analyzing food and drink habits in Foursquare. In Proc. 8th Int. AAAI Conf. Weblogs and Social Media (ICWSM), pages 466–475, June 2014. [20] L. R. Varshney. Fundamental limits of data analytics in sociotechnical systems. 2015. submitted. [21] J. Yoon and T. T. Brown. Does the promotion of community social capital reduce obesity risk? J. Socio-Econ., 40(3):296–305, May 2011. [22] A. X. Zhang, A. Noulas, S. Scellato, and C. Mascolo. Hoodsquare: Modeling and recommending neighborhoods in location-based social networks. In Proc. 2013 Int. Conf. Social Comput. (SocialCom), pages 69–74, Sept. 2013.
  • 7. (a) Arts & Crafts Store (b) Bookstore (c) Design Studio (d) French Restaurant (e) Wine Bar (f) Wine Shop Figure 4: Scatter plots with linear regression for some subcategories that are negatively correlated with obesity rates. (a) Automotive Shop (b) Fast Food Restaurant (c) Fried Chicken Joint (d) Gas Station Garage (e) Housing Development (f) Subway Figure 5: Scatter plots with linear regression for some subcategories that are positively correlated with obesity rates.