Social Media for Lifestyle Health: Multimedia

Social Media for Lifestyle Health
Yelena Mejova
Summer School Series on Methods
for Computational Social Science
July 23, 2019

2
http://www.who.int/mediacentre/factsheets/fs310/en/

3
https://www.oecd.org/health/health-systems/Obesity-Update-2017.pdf

self-motivated
plentiful
real-time
geo-located
media rich
social
cultural
interactive
5
self-image
noisy
bursty
geo-biased
complex signal
influence
contextual
persuasive

Social Media Lifestyle Health
1. What’s been done
2. Example pipeline

Social Media Obesity
You Tweet What You Eat: Studying Food Consumption Through Twitter
Sofiane Abbar, Yelena Mejova, Ingmar Weber @ CHI'15

data / topic refinement
food lexicon
streaming API
geo-located users
user histories
Twitter food lexicon 1,2,3…
Twitter food dataset

quantification
• estimating distribution of calories
• not per-item precision
• mean descriptors

geo-location
• GPS tagging of tweets (<5%)
• User location strings (up to 65%)
• Locations mentioned in tweets (unreliable)

population levelaverage caloric value of all foods mentioned in
tweet using exact keyword matching
vs
obesity @ Center of Disease Control .
food keyword frequency vs obesity
r = 0.56***
final model: demographics + detected foods

• Education &
Income:
– US Census at ZIP
code level
• Gender:
– genderize.io
37% f / 32% m
– neither:
– 26.7% not human,
excluded

crowdsourcing
• Got a complicated tasks computational tools cannot handle
(well enough)?
• Break it into small pieces and have random people on
internet do it
• Cheap per task
• Several labels per task for
quality assessment
• Are people any good at your
task? (upper bound)

rural urban
http://www.siani.se/event/tropentag-2013-agricultural-development-within-rural-urban-continuum/september-2013
top distinguishing foods by difference in probabilities of one group to another

individual level
• #fat*problems
self-disclosure of
weight (possibly)
• High-precision
keyword detection of
interests/conditions in
user profiles
change in obesity rate

>
individual level
• WeFollow prominent
user lists, followership
as proxy for interest
top 15 factors by magnitude of coefficient

network level
• Friendship & Mention networks
• Social activation: users above 90th percentile in
terms of obesity and/or diabetes score
(personalized using Ridge regression on foods a
user has mentioned)
• Threshold model: success of a social diffusion
process depends on reaching a certain critical
number of adopters
• Activation probability given x of your neighbors
are activated users

Friendship Network Mention Network
network level
• Content spread: remove replies & retweets
• Geography: remove links from same state

Social Media Food Deserts
Characterizing dietary choices, nutrition, and language in
food deserts via social media
De Choudhury, Sharma, Kiciman @ CSCW'16

Food desert:
– Low-income census tracts with a substantial
number or share of residents with low levels of
access to retail outlets selling healthy and
affordable foods
• an estimated 13.5
million people in the
United States have low
access to a
supermarket or large
grocery store, with 82
percent living in urban
areas.
https://www.ers.usda.gov/data-products/food-access-research-atlas/documentation/

• more precise
geo-location
• more
multimedia

data collection
canonical
food lexicon
geo-located posts
census tracts
14 million posts
8 million users
July 2013 - March 2015
35.5% posts with geo-location tags

geo-location 2
• knowing long/lat of
content
• obtain mapping to
US Census tract

• need to control for confounding variables
when comparing to a control set
controltreatment

Socioeconomic Variables
population
% minority population
#households
#families
% non-Hispanic whites
median house age
median family income
owner occupied housing units
distressed/underserved tract
Similarity via Mahalanobis distance
Selection via k Nearest Neighbors
Discard FD tracts without good enough matches
matching

• differences in nutrition
between FD and matched
NFD can be big
• but they also vary across
the region
Statistical significance in nutritional
attributes of FDs and NFDs, with
Bonferroni adjustment of α

predicting whether a
tract is a food desert
• S = socioeconomic
• F = food
deprivation
• T = topic
distribution
classification
Instagram topic information helps!

error analysis
• social media helps
identify recent
developments like
gentrification
Atlanta, Georgia
Popular topics:
“smoothie”, “organic”,
“farmtotable”, “baking”

Social Media Images Disclosure
Is #Saki Delicious?: The Food Perception Gap on Instagram
and Its Relation to Health
Ofli, Aytar, Weber, al Hammouri, Torralba @ WWW'17

• most studies take
hashtags as true
description of image
content, but are
they?

data collection
• query: #food,
#foodporn,
#foodie,
#breakfast,
#lunch, #dinner
• 72M images, 26M
with location
• 4M assigned to US
county
• 3.7M on hashtags
canonical
food lexicon
food-related posts
geo-localized
food-related posts
10,000 hashtags
as food-related

geo-location 3
• Geo-location:
– county shape files from US Census
• https://www.census.gov/programs-
surveys/geography.html
– shapely python library
• https://github.com/Toblerity/Shapely

image labeling
• Deep residual network – “can train substantially
deeper models”
– He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep
residual learning for image recognition. In Proceedings
of the IEEE Conference on Computer Vision and
Pattern Recognition (pp. 770-778).
– In the training procedure, the final 1000-way softmax
in the deep residual model is re- placed with a 101-
way softmax, and the model is fine-tuned on the Insta-
101 and Food-101 datasets individually

image labeling
L. Bossard, M. Guillaumin, and L. Van Gool. Food-101–mining
discriminative components with random forests. In European
Conference on Computer Vision, 2014.
101 food categories, 101,000 images (750 train / 250 test)
Instagram images matched manually to Food-101 categories
(4000 train / 250 test, no manual cleaning)

perception gap
• difference in how a machine
and a human annotate a
given image
• then aggregated for user (no
one user contributes too
much), then per county

perception gap
machine is more likely than human to
label post #instabeer in places where
there is higher food insecurity
machine is less likely than human to
label post #sushiroll in places where
there is higher food insecurity
machine is less likely than human to
label post #chicagopizza in places
where there is higher alcohol-related
driving deaths

variation in subjective labels
• for a subjective user tag j, and each machine tag i,
compute P(j|i)
• computed first for a user, then aggregated per
county
• focus on #healthy, #delicious, #organic

variation in subjective labels

• Social media can be a source of cheap training
data
• Images provide alternative “view” to hashtags
observations

Social Media Images Health
Social Media Image Analysis for Public Health
Garimella, Alfayad, Weber@ CHI’16

data collection
• geo-located Instagram posts at restaurants
• mapped to US counties using Federal
Communication Commission API
• top 100 counties by image count used
• 2,000 images randomly selected for each
county

Imagga
• returns tags with a
confidence score (use
tags with at least 20%
confidence)
• tags appearing in less
than 10 counties are
ignored
• free account limit 1
image per second
try it! https://imagga.com/auto-tagging-demo

predicting public health
Ridge regression with smoothing parameter α=0.1
U = user-provided tags, I = machine-generated tags, D = demographics
Showing correlation between predicted health statistic and known
Statistical significance of demography-only baseline

imagga tags user tags + demog user tags

Physically Inactive
top distinguishing features

Social Media Images Mental Health
What Twitter Profile and Posted Images Reveal
about Depression and Anxiety
Guntuku, Preotiuc-Pietro, Eichstaedt, Ungar @ ICWSM’19

data collection
• surveys administered on platform
Qualtrics
• demography, Beck’s Depression
Inventory
• informed consent
• Twitter handles
• 560 people with 20 or more images
posted
• + Facebook dataset, text only
highdepressionlowdepression

image description
• Hue – Saturation – Value (HSV)
• Hue count (professional photos have fewer hues)
• 6-bin and 12-bin color histograms
• Warm & cold colors
• Aesthetics: deep CNN that produces labels such as object emphasis,
rule of thirds, symmetry, motion blur, vivid color…
• Content: Imagga tags (top 10) clustered via Normalized Pointwise
Mutual Information (NPMI)
• Content: VGG-Net image classifier for 1,000 objects
• Face features via Face++ and EmoVu APIs (emotions, smiling)

Pearson correlations
between color and
aesthetic features extracted
from images and mental
health conditions, and with
age and gender (coded as 1
for female, 0 for male)
separately. Correlations for
mental health conditions
are controlled for age,
gender and other mental
health condition. Only
significant correlations (p <
.01, two-tailed t-test) are
presented.

“while depressed users
preferred images which are
not sharp and which do not
show face, anxious users
usually chose sharper
images with multiple
people in them”

Pearson correlation between Imagga tag clusters and mental health conditions

mental health classification
Using visual features of posted images in predicting mental health conditions. Using
linear regression with ElasticNet regularization. Performance measured via Pearson
correlation (MSE in brackets). ST – single task, MT – multi-task
Additional training on text-based features (users labeled for age and gender) boosts
performance up to r = .167 for depression and r = .223 for anxiety.

observations
• easier to model populations than individuals
• easier to model behaviors than mental states
• text often describes the images (hashtags)
• getting ground truth can be difficult

Could you do it better?
Can people guess a person’s BMI better from a picture than a machine?
Face-to-BMI: Using Computer Vision to Infer Body Mass Index on Social Media Garimella,
Kocabey, Camurcu, Ofli, Aytar, Marin, Torralba, Weber @ ICWSM’17

Male
30 years old
6 feet 5 inches tall
Starting weight 385lb
Current weight 310lb
Goal weight: 225lb

data collection
• Cropping images with known gender, height, and weight from Reddit
• 4206 faces with BMI = body mass in kg / (body height in m)2

face-to-BMI modeling
• Use pre-trained deep learning features:
– general object classification (VGG-Net)
– face recognition task (VGG-Face)
• Then train on the BMI dataset, test on held-out set

crowdsourcing BMI
• simpler than guessing
BMI: comparing BMI of
two pictures (M-M, F-F,
M-F)
• use Amazon
Mechanical Turk, 3
labels per task
• compared to machine,
human performance
differs by 2%

bias?
• Algorithm could be learning existing stereotypes (ex:
African Americans tend to have higher obesity rates in US)
• Try balanced set of 2000 male-female pairs, 1037 chosen
higher BMI for females (p = 0.05)
• Try balanced set of 2000 White-African American pairs,
1085 chosen higher BMI for White (p = 0.05)

Bonus: recipe + image
Learning Cross-modal Embeddings for Cooking Recipes and Food Images
Salvador, Hynes, Aytar, Marin, Ofli, Weber, Torralba @ CVPR'17

• combining recipe information (ingredients + instructions) with images
• crawled recipe websites, standardized the information: Recipe1M dataset

• use LSTMs for modeling ingredients and two-stage LSTMs for modeling cooking
instructions, deep convolutional networks for image representation
• Semantic Regularization learns mapping between images and food categories

observations
• people annotate their photos in some settings
• cannot see some ingredients but may find
associations (coke -> sugar?)

• Getting social media data
– Ex: Twitter stream listener
• Geolocating users
– Ex: CLAVIN
• Linking to health statistics & census
– Ex: County Health Rankings & IStat
• Labeling images
– Ex: Imagga
• Linking to nutritional information
– Ex: Nutrition lexicon from recipe
• Relating diet to obesity

Twitter Streaming Pipeline
• Need to think of keywords at beginning of project
• But get a lot of data over time
• Make sure job is always running, restart if necessary, store data in small chunks
Twitter
Streaming
API client
Twitter Job Watcher crontab
Daily
dumps
Log

TwitterStreamingAPI.py
(API keys are bogus)

is the job still running?
JobWatcher.pl
what is the last file saved?
get today’s formatted date
output status
if it’s not running, start the job
if it’s running but it’s next day, restart it
process yesterday’s data

parseTwitterJSON.py
Watch out for
encoding
tabs
newlines
errors
There will always be errors

using non-streaming APIs
• careful with rate limits
• restrictions on amount of data (back in time,
number of items)
• deleted content or accounts are not there
• get historical interactions with content (likes)

geo-location: CLAVIN
• https://github.com/Berico-Technologies/CLAVIN
• uses allCountries.zip gazetteer file from GeoNames.org

The Somali Ministry of Information, Posts and Telecommunications started the process of distributing
6,000 hand-held radios to Internally Displaced Persons (IDPs) in Mogadishu. In the first batch, the
Ministry handed out 1,000 radios at Badbado camp, Somalia's largest IDP camp. The radios were
received by to those most in need: namely, female-headed households, elderly and youth groups.
The beneficiaries will receive news and important information concerning relief efforts and public
safety messages daily. The small emergency radios are both solar-powered and hand-cranked and can
also operate with batteries. The radios can be tuned in to multiple frequencies.
The Deputy Minister of Information, Posts and Telecommunications, H.E. Abdullahi Bile Nur, who
witnessed the distribution process at Badbado camp, said: "In any emergency, the first priority is the
delivery of critical aid, but communities need more than that. They also need information. It is
important for them to know where they can get water, where they can get certain facilities, how to
access those facilities."
"We believe the radios will make a difference in terms of morale and education;" he added. Radio
Mogadishu broadcasts a daily show named 'Recovery' (previously 'Help') that is packaged along with
the latest announcements and information from humanitarian agencies. The program offers guidance
on hygiene and sanitation, nutrition, child education, good neighborhood, becoming productive
members of the society, among other key topics.
Somalia’s Prime Minister Dr. Abdiweli Mohamed Ali received the European Union envoy to Somalia
Alexander Rondes in his office in Mogadishu today. The envoy was accompanied by EU officials and
others while Somali minister for defence Hussein Arab Isse and minister for foreign affairs were also
present in the meeting.
The premier warmly received the EU envoy and thanked him for visiting Mogadishu. He requested the
EU to double its efforts in the restoration of peace and stability in Somalia.
The two leaders discussed the strategic plans of setting up control and authority in the areas reclaimed
from Al-shabab in order to deliver the much needed public services and humanitarian aid to the
people.
The meeting by the premier and the EU envoy also highlighted the upcoming London meeting which
aims at delivering a new international approach to Somalia. The premier stated that the prevalent
security made by the government provides an opportune time for consolidation of such gains.
The special envoy, who was appointed by EU to represent the horn of Africa region, stated that he will
give priority to Somalia. It seems the international community is making concerted effort to the
Somalia issue after the government made important strides in security, the new constitution,
restructuring the parliament, local administrations’ cooperation and good governance.
Resolved "European Union" as: "European Union" {European Union (No Man's Land, )
[pop: 0] <6695072>}, position: 1590, confidence: 1.000000, fuzzy: false
[50.83857,4.37599]
Resolved "London" as: "london"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@285818}, position: 2306,
confidence: 1.000000, fuzzy: false
[51.50853,-0.12574]
Resolved "Africa" as: "Africa" {Africa (No Man's Land, ) [pop: 1031833000]
<6255146>}, position: 2586, confidence: 1.000000, fuzzy: false
[7.1881,21.09375]
Resolved "Muna" as: "Muna"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@35c23f}, position: 2957,
[20.48794,-89.71387]
Resolved "Mataban" as: "Mataban"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@d460}, position: 4161,
[5.20401,45.53353]
Resolved "Birmingham" as: "Birmingham"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@28866c}, position: 4841,
[52.48142,-1.89983]
Resolved "Hiran" as: "Hiran"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@7b1830}, position: 5208,
[14.44998,45.57068]
Resolved "London" as: "london"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@285818}, position: 6129,
[51.50853,-0.12574]
Resolved "East Africa" as: "Portuguese East Africa" {Mozambique (Mozambique, 00)
[pop: 22061451] <1036973>}, position: 7589, confidence: 1.000000, fuzzy: false
[-18.25,35.0]
In: Out:
Check Quality of Matching for Most Frequent Terms!

GPS to FIPS (USA)
often need smaller
administrative unit ID

Linking to personal health
• Crowdsourcing
• Personal health,
demographics,
beliefs…
• Experimentation
• IRB approval!

curl -u
"acc_3d8aa27c6368803:be371863f7b7e4715d17
a75ff759f44d"
"https://api.imagga.com/v2/tags?image_url=http
s://s.yimg.com/ge/labs/v1/uploads/20131031_0
93854-300x225.jpg" > imagga_example.out

Face Analyze
curl -X POST "https://api-us.faceplusplus.com/facepp/v3/face/analyze"
-F "api_key=Cc9Pnq-G01FVr5FO1unI-CtQMcPVl9YM"
-F "api_secret=VwWy_pcnyOdvP0I-0GBbBiaop1b-TjHD"
-F "return_landmark=1"
-F "return_attributes=gender,age,smiling,emotion,ethnicity,beauty,mouthstatus,skinstatus"
-F "face_tokens=0221733fc86b65b9188e26f6c7b85c56"

USDA
• Huge list
• Includes producer
• Too specific
• Repetitive
• Formal language

Recipes
• Also huge
• Human language
• No nutrient info
• Social info

Recipes + Ingredients + Nutrients
• Remove or flag
popular ambiguous
terms: honey, apple,
salsa, …
• Remember: food
density, not portions!

Fetishizing food in digital age: #Foodporn around the world.
Mejova, Abbar, Haddadi @ ICWSM'16

106
@yelenamejova
yelenamejova.com
yelena.mejova@isi.it
Slides:
slideshare.net

Social Media for Lifestyle Health: Multimedia

Recommended

Recommended

More Related Content

Similar to Social Media for Lifestyle Health: Multimedia

Similar to Social Media for Lifestyle Health: Multimedia (20)

More from Yelena Mejova

More from Yelena Mejova (10)

Recently uploaded

Recently uploaded (20)

Social Media for Lifestyle Health: Multimedia

Editor's Notes