SlideShare a Scribd company logo
Social Media for Lifestyle Health
Yelena Mejova
Summer School Series on Methods
for Computational Social Science
July 23, 2019
2
http://www.who.int/mediacentre/factsheets/fs310/en/
3
https://www.oecd.org/health/health-systems/Obesity-Update-2017.pdf
social media
4
self-motivated
plentiful
real-time
geo-located
media rich
social
cultural
interactive
5
self-image
noisy
bursty
geo-biased
complex signal
influence
contextual
persuasive
Social Media Lifestyle Health
1. What’s been done
2. Example pipeline
Social Media Obesity
You Tweet What You Eat: Studying Food Consumption Through Twitter
Sofiane Abbar, Yelena Mejova, Ingmar Weber @ CHI'15
data / topic refinement
food lexicon
streaming API
geo-located users
user histories
Twitter food lexicon 1,2,3…
Twitter food dataset
quantification
• estimating distribution of calories
• not per-item precision
• mean descriptors
geo-location
• GPS tagging of tweets (<5%)
• User location strings (up to 65%)
• Locations mentioned in tweets (unreliable)
population levelaverage caloric value of all foods mentioned in
tweet using exact keyword matching
vs
obesity @ Center of Disease Control .
food keyword frequency vs obesity
r = 0.56***
final model: demographics + detected foods
• Education &
Income:
– US Census at ZIP
code level
• Gender:
– genderize.io
37% f / 32% m
– neither:
– 26.7% not human,
excluded
crowdsourcing
• Got a complicated tasks computational tools cannot handle
(well enough)?
• Break it into small pieces and have random people on
internet do it
• Cheap per task
• Several labels per task for
quality assessment
• Are people any good at your
task? (upper bound)
rural urban
http://www.siani.se/event/tropentag-2013-agricultural-development-within-rural-urban-continuum/september-2013
top distinguishing foods by difference in probabilities of one group to another
individual level
• #fat*problems
self-disclosure of
weight (possibly)
• High-precision
keyword detection of
interests/conditions in
user profiles
change in obesity rate
>
individual level
• WeFollow prominent
user lists, followership
as proxy for interest
top 15 factors by magnitude of coefficient
network level
• Friendship & Mention networks
• Social activation: users above 90th percentile in
terms of obesity and/or diabetes score
(personalized using Ridge regression on foods a
user has mentioned)
• Threshold model: success of a social diffusion
process depends on reaching a certain critical
number of adopters
• Activation probability given x of your neighbors
are activated users
Friendship Network Mention Network
network level
• Content spread: remove replies & retweets
• Geography: remove links from same state
Social Media Food Deserts
Characterizing dietary choices, nutrition, and language in
food deserts via social media
De Choudhury, Sharma, Kiciman @ CSCW'16
Food desert:
– Low-income census tracts with a substantial
number or share of residents with low levels of
access to retail outlets selling healthy and
affordable foods
• an estimated 13.5
million people in the
United States have low
access to a
supermarket or large
grocery store, with 82
percent living in urban
areas.
https://www.ers.usda.gov/data-products/food-access-research-atlas/documentation/
• more precise
geo-location
• more
multimedia
data collection
canonical
food lexicon
geo-located posts
census tracts
14 million posts
8 million users
July 2013 - March 2015
35.5% posts with geo-location tags
geo-location 2
• knowing long/lat of
content
• obtain mapping to
US Census tract
• need to control for confounding variables
when comparing to a control set
controltreatment
Socioeconomic Variables
population
% minority population
#households
#families
% non-Hispanic whites
median house age
median family income
owner occupied housing units
distressed/underserved tract
Similarity via Mahalanobis distance
Selection via k Nearest Neighbors
Discard FD tracts without good enough matches
matching
log likelihood ratio =
• differences in nutrition
between FD and matched
NFD can be big
• but they also vary across
the region
Statistical significance in nutritional
attributes of FDs and NFDs, with
Bonferroni adjustment of α
predicting whether a
tract is a food desert
• S = socioeconomic
• F = food
deprivation
• T = topic
distribution
classification
Instagram topic information helps!
error analysis
• social media helps
identify recent
developments like
gentrification
Atlanta, Georgia
Popular topics:
“smoothie”, “organic”,
“farmtotable”, “baking”
Social Media Images Disclosure
Is #Saki Delicious?: The Food Perception Gap on Instagram
and Its Relation to Health
Ofli, Aytar, Weber, al Hammouri, Torralba @ WWW'17
• most studies take
hashtags as true
description of image
content, but are
they?
data collection
• query: #food,
#foodporn,
#foodie,
#breakfast,
#lunch, #dinner
• 72M images, 26M
with location
• 4M assigned to US
county
• 3.7M on hashtags
canonical
food lexicon
food-related posts
geo-localized
food-related posts
10,000 hashtags
as food-related
geo-location 3
• Geo-location:
– county shape files from US Census
• https://www.census.gov/programs-
surveys/geography.html
– shapely python library
• https://github.com/Toblerity/Shapely
image labeling
• Deep residual network – “can train substantially
deeper models”
– He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep
residual learning for image recognition. In Proceedings
of the IEEE Conference on Computer Vision and
Pattern Recognition (pp. 770-778).
– In the training procedure, the final 1000-way softmax
in the deep residual model is re- placed with a 101-
way softmax, and the model is fine-tuned on the Insta-
101 and Food-101 datasets individually
image labeling
L. Bossard, M. Guillaumin, and L. Van Gool. Food-101–mining
discriminative components with random forests. In European
Conference on Computer Vision, 2014.
101 food categories, 101,000 images (750 train / 250 test)
Instagram images matched manually to Food-101 categories
(4000 train / 250 test, no manual cleaning)
perception gap
• difference in how a machine
and a human annotate a
given image
• then aggregated for user (no
one user contributes too
much), then per county
perception gap
machine is more likely than human to
label post #instabeer in places where
there is higher food insecurity
machine is less likely than human to
label post #sushiroll in places where
there is higher food insecurity
machine is less likely than human to
label post #chicagopizza in places
where there is higher alcohol-related
driving deaths
variation in subjective labels
• for a subjective user tag j, and each machine tag i,
compute P(j|i)
• computed first for a user, then aggregated per
county
• focus on #healthy, #delicious, #organic
variation in subjective labels
variation in subjective labels
• Social media can be a source of cheap training
data
• Images provide alternative “view” to hashtags
observations
Social Media Images Health
Social Media Image Analysis for Public Health
Garimella, Alfayad, Weber@ CHI’16
data collection
• geo-located Instagram posts at restaurants
• mapped to US counties using Federal
Communication Commission API
• top 100 counties by image count used
• 2,000 images randomly selected for each
county
Imagga
• returns tags with a
confidence score (use
tags with at least 20%
confidence)
• tags appearing in less
than 10 counties are
ignored
• free account limit 1
image per second
try it! https://imagga.com/auto-tagging-demo
predicting public health
Ridge regression with smoothing parameter α=0.1
U = user-provided tags, I = machine-generated tags, D = demographics
Showing correlation between predicted health statistic and known
Statistical significance of demography-only baseline
predicting public health
imagga tags user tags + demog user tags
predicting public health
Physically Inactive
top distinguishing features
Social Media Images Mental Health
What Twitter Profile and Posted Images Reveal
about Depression and Anxiety
Guntuku, Preotiuc-Pietro, Eichstaedt, Ungar @ ICWSM’19
data collection
• surveys administered on platform
Qualtrics
• demography, Beck’s Depression
Inventory
• informed consent
• Twitter handles
• 560 people with 20 or more images
posted
• + Facebook dataset, text only
highdepressionlowdepression
image description
• Hue – Saturation – Value (HSV)
• Hue count (professional photos have fewer hues)
• 6-bin and 12-bin color histograms
• Warm & cold colors
• Aesthetics: deep CNN that produces labels such as object emphasis,
rule of thirds, symmetry, motion blur, vivid color…
• Content: Imagga tags (top 10) clustered via Normalized Pointwise
Mutual Information (NPMI)
• Content: VGG-Net image classifier for 1,000 objects
• Face features via Face++ and EmoVu APIs (emotions, smiling)
Pearson correlations
between color and
aesthetic features extracted
from images and mental
health conditions, and with
age and gender (coded as 1
for female, 0 for male)
separately. Correlations for
mental health conditions
are controlled for age,
gender and other mental
health condition. Only
significant correlations (p <
.01, two-tailed t-test) are
presented.
“while depressed users
preferred images which are
not sharp and which do not
show face, anxious users
usually chose sharper
images with multiple
people in them”
Pearson correlation between Imagga tag clusters and mental health conditions
mental health classification
Using visual features of posted images in predicting mental health conditions. Using
linear regression with ElasticNet regularization. Performance measured via Pearson
correlation (MSE in brackets). ST – single task, MT – multi-task
Additional training on text-based features (users labeled for age and gender) boosts
performance up to r = .167 for depression and r = .223 for anxiety.
observations
• easier to model populations than individuals
• easier to model behaviors than mental states
• text often describes the images (hashtags)
• getting ground truth can be difficult
Could you do it better?
Can people guess a person’s BMI better from a picture than a machine?
Face-to-BMI: Using Computer Vision to Infer Body Mass Index on Social Media Garimella,
Kocabey, Camurcu, Ofli, Aytar, Marin, Torralba, Weber @ ICWSM’17
Male
30 years old
6 feet 5 inches tall
Starting weight 385lb
Current weight 310lb
Goal weight: 225lb
data collection
• Cropping images with known gender, height, and weight from Reddit
• 4206 faces with BMI = body mass in kg / (body height in m)2
face-to-BMI modeling
• Use pre-trained deep learning features:
– general object classification (VGG-Net)
– face recognition task (VGG-Face)
• Then train on the BMI dataset, test on held-out set
crowdsourcing BMI
• simpler than guessing
BMI: comparing BMI of
two pictures (M-M, F-F,
M-F)
• use Amazon
Mechanical Turk, 3
labels per task
• compared to machine,
human performance
differs by 2%
bias?
• Algorithm could be learning existing stereotypes (ex:
African Americans tend to have higher obesity rates in US)
• Try balanced set of 2000 male-female pairs, 1037 chosen
higher BMI for females (p = 0.05)
• Try balanced set of 2000 White-African American pairs,
1085 chosen higher BMI for White (p = 0.05)
Bonus: recipe + image
Learning Cross-modal Embeddings for Cooking Recipes and Food Images
Salvador, Hynes, Aytar, Marin, Ofli, Weber, Torralba @ CVPR'17
• combining recipe information (ingredients + instructions) with images
• crawled recipe websites, standardized the information: Recipe1M dataset
• use LSTMs for modeling ingredients and two-stage LSTMs for modeling cooking
instructions, deep convolutional networks for image representation
• Semantic Regularization learns mapping between images and food categories
Image/Recipe
Retrieval
Concept math
images
Concept math
recipes
Concept math
cross-modal
observations
• people annotate their photos in some settings
• cannot see some ingredients but may find
associations (coke -> sugar?)
2. Example Pipeline
• Getting social media data
– Ex: Twitter stream listener
• Geolocating users
– Ex: CLAVIN
• Linking to health statistics & census
– Ex: County Health Rankings & IStat
• Labeling images
– Ex: Imagga
• Linking to nutritional information
– Ex: Nutrition lexicon from recipe
• Relating diet to obesity
getting data
Twitter Streaming Pipeline
• Need to think of keywords at beginning of project
• But get a lot of data over time
• Make sure job is always running, restart if necessary, store data in small chunks
Twitter
Streaming
API client
Twitter Job Watcher crontab
Daily
dumps
Log
TwitterStreamingAPI.py
(API keys are bogus)
is the job still running?
JobWatcher.pl
what is the last file saved?
get today’s formatted date
output status
if it’s not running, start the job
if it’s running but it’s next day, restart it
process yesterday’s data
crontab
parseTwitterJSON.py
Watch out for
encoding
tabs
newlines
errors
There will always be errors
using non-streaming APIs
• careful with rate limits
• restrictions on amount of data (back in time,
number of items)
• deleted content or accounts are not there
• get historical interactions with content (likes)
geolocation
geo-location: CLAVIN
• https://github.com/Berico-Technologies/CLAVIN
• uses allCountries.zip gazetteer file from GeoNames.org
The Somali Ministry of Information, Posts and Telecommunications started the process of distributing
6,000 hand-held radios to Internally Displaced Persons (IDPs) in Mogadishu. In the first batch, the
Ministry handed out 1,000 radios at Badbado camp, Somalia's largest IDP camp. The radios were
received by to those most in need: namely, female-headed households, elderly and youth groups.
The beneficiaries will receive news and important information concerning relief efforts and public
safety messages daily. The small emergency radios are both solar-powered and hand-cranked and can
also operate with batteries. The radios can be tuned in to multiple frequencies.
The Deputy Minister of Information, Posts and Telecommunications, H.E. Abdullahi Bile Nur, who
witnessed the distribution process at Badbado camp, said: "In any emergency, the first priority is the
delivery of critical aid, but communities need more than that. They also need information. It is
important for them to know where they can get water, where they can get certain facilities, how to
access those facilities."
"We believe the radios will make a difference in terms of morale and education;" he added. Radio
Mogadishu broadcasts a daily show named 'Recovery' (previously 'Help') that is packaged along with
the latest announcements and information from humanitarian agencies. The program offers guidance
on hygiene and sanitation, nutrition, child education, good neighborhood, becoming productive
members of the society, among other key topics.
Somalia’s Prime Minister Dr. Abdiweli Mohamed Ali received the European Union envoy to Somalia
Alexander Rondes in his office in Mogadishu today. The envoy was accompanied by EU officials and
others while Somali minister for defence Hussein Arab Isse and minister for foreign affairs were also
present in the meeting.
The premier warmly received the EU envoy and thanked him for visiting Mogadishu. He requested the
EU to double its efforts in the restoration of peace and stability in Somalia.
The two leaders discussed the strategic plans of setting up control and authority in the areas reclaimed
from Al-shabab in order to deliver the much needed public services and humanitarian aid to the
people.
The meeting by the premier and the EU envoy also highlighted the upcoming London meeting which
aims at delivering a new international approach to Somalia. The premier stated that the prevalent
security made by the government provides an opportune time for consolidation of such gains.
The special envoy, who was appointed by EU to represent the horn of Africa region, stated that he will
give priority to Somalia. It seems the international community is making concerted effort to the
Somalia issue after the government made important strides in security, the new constitution,
restructuring the parliament, local administrations’ cooperation and good governance.
Resolved "European Union" as: "European Union" {European Union (No Man's Land, )
[pop: 0] <6695072>}, position: 1590, confidence: 1.000000, fuzzy: false
[50.83857,4.37599]
Resolved "London" as: "london"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@285818}, position: 2306,
confidence: 1.000000, fuzzy: false
[51.50853,-0.12574]
Resolved "Africa" as: "Africa" {Africa (No Man's Land, ) [pop: 1031833000]
<6255146>}, position: 2586, confidence: 1.000000, fuzzy: false
[7.1881,21.09375]
Resolved "Muna" as: "Muna"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@35c23f}, position: 2957,
confidence: 1.000000, fuzzy: false
[20.48794,-89.71387]
Resolved "Mataban" as: "Mataban"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@d460}, position: 4161,
confidence: 1.000000, fuzzy: false
[5.20401,45.53353]
Resolved "Birmingham" as: "Birmingham"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@28866c}, position: 4841,
confidence: 1.000000, fuzzy: false
[52.48142,-1.89983]
Resolved "Hiran" as: "Hiran"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@7b1830}, position: 5208,
confidence: 1.000000, fuzzy: false
[14.44998,45.57068]
Resolved "London" as: "london"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@285818}, position: 6129,
confidence: 1.000000, fuzzy: false
[51.50853,-0.12574]
Resolved "East Africa" as: "Portuguese East Africa" {Mozambique (Mozambique, 00)
[pop: 22061451] <1036973>}, position: 7589, confidence: 1.000000, fuzzy: false
[-18.25,35.0]
In: Out:
Check Quality of Matching for Most Frequent Terms!
GPS to FIPS (USA)
often need smaller
administrative unit ID
health statistics
Linking to health statistics
Linking to health statistics
Linking to health statistics
Linking to personal health
• Crowdsourcing
• Personal health,
demographics,
beliefs…
• Experimentation
• IRB approval!
image processing
Image processing: Imagga
curl -u
"acc_3d8aa27c6368803:be371863f7b7e4715d17
a75ff759f44d"
"https://api.imagga.com/v2/tags?image_url=http
s://s.yimg.com/ge/labs/v1/uploads/20131031_0
93854-300x225.jpg" > imagga_example.out
Face Plus Plus
Face Plus Plus
Face Detect
Face Analyze
curl -X POST "https://api-us.faceplusplus.com/facepp/v3/face/analyze" 
-F "api_key=Cc9Pnq-G01FVr5FO1unI-CtQMcPVl9YM" 
-F "api_secret=VwWy_pcnyOdvP0I-0GBbBiaop1b-TjHD" 
-F "return_landmark=1" 
-F "return_attributes=gender,age,smiling,emotion,ethnicity,beauty,mouthstatus,skinstatus" 
-F "face_tokens=0221733fc86b65b9188e26f6c7b85c56"
nutritional info
USDA
• Huge list
• Includes producer
• Too specific
• Repetitive
• Formal language
Recipes
• Also huge
• Human language
• No nutrient info
• Social info
Recipes + Ingredients + Nutrients
• Remove or flag
popular ambiguous
terms: honey, apple,
salsa, …
• Remember: food
density, not portions!
put it together
https://foodporn.qcri.org/
Fetishizing food in digital age: #Foodporn around the world.
Mejova, Abbar, Haddadi @ ICWSM'16
106
@yelenamejova
yelenamejova.com
yelena.mejova@isi.it
Slides:
slideshare.net

More Related Content

Similar to Social Media for Lifestyle Health: Multimedia

Univ 291 mercy housing lakefront final presentation!
Univ 291   mercy housing lakefront final presentation!Univ 291   mercy housing lakefront final presentation!
Univ 291 mercy housing lakefront final presentation!
msullivan4
 
Univ 291 -_mercy_housing_lakefront_final_presentation
Univ 291 -_mercy_housing_lakefront_final_presentationUniv 291 -_mercy_housing_lakefront_final_presentation
Univ 291 -_mercy_housing_lakefront_final_presentation
fogutu
 
Social Media Research and Practice in the Health Domain - Tutorial, Part II
Social Media Research and Practice in the Health Domain - Tutorial, Part IISocial Media Research and Practice in the Health Domain - Tutorial, Part II
Social Media Research and Practice in the Health Domain - Tutorial, Part II
Ingmar Weber
 
Digital Demography - WWW'17 Tutorial - Part II
Digital Demography - WWW'17 Tutorial - Part IIDigital Demography - WWW'17 Tutorial - Part II
Digital Demography - WWW'17 Tutorial - Part II
Ingmar Weber
 
Sdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalSdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) final
kimlyman
 
Needs assessment training cycle iv
Needs assessment training cycle ivNeeds assessment training cycle iv
Needs assessment training cycle iv
Twyla Baker-Demaray
 
Combining Behaviors and Demographics to Segment Online Audiences:Experiments ...
Combining Behaviors and Demographics to Segment Online Audiences:Experiments ...Combining Behaviors and Demographics to Segment Online Audiences:Experiments ...
Combining Behaviors and Demographics to Segment Online Audiences:Experiments ...
Joni Salminen
 
Gap Calculator
Gap CalculatorGap Calculator
Gap Calculator
Goodzuma
 
AVRC Community Based HIV and Aging
AVRC Community Based HIV and AgingAVRC Community Based HIV and Aging
AVRC Community Based HIV and Aging
UC San Diego AntiViral Research Center
 
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
GUANGYUAN PIAO
 
Foodprint UX Report
Foodprint UX ReportFoodprint UX Report
Foodprint UX Report
Peony Trinh
 
Logging in 3 communities - lightning talk festivIL 2021
Logging in 3 communities - lightning talk festivIL 2021Logging in 3 communities - lightning talk festivIL 2021
Logging in 3 communities - lightning talk festivIL 2021
Pamela McKinney
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
Symeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
Eleftherios Spyromitros-Xioufis
 
Demographic Targeting Insights
Demographic Targeting InsightsDemographic Targeting Insights
Demographic Targeting Insights
Daniel McKean
 
Health Datapalooza 2013: Datalab - Robert Post
Health Datapalooza 2013: Datalab - Robert PostHealth Datapalooza 2013: Datalab - Robert Post
Health Datapalooza 2013: Datalab - Robert Post
Health Data Consortium
 
Colloque RI 2014 : Intervention de Ross C. BROWNSON (Washington University in...
Colloque RI 2014 : Intervention de Ross C. BROWNSON (Washington University in...Colloque RI 2014 : Intervention de Ross C. BROWNSON (Washington University in...
Colloque RI 2014 : Intervention de Ross C. BROWNSON (Washington University in...
Institut national du cancer
 
Grant Writing and Reporting
Grant Writing and ReportingGrant Writing and Reporting
Grant Writing and Reporting
Healthy City
 
Vera Data 2.0 Slides and Script
Vera Data 2.0 Slides and ScriptVera Data 2.0 Slides and Script
Vera Data 2.0 Slides and Script
Sara Vera
 
Stakeholders of Organic Products in Mexico and Korea
Stakeholders of Organic Products in Mexico and KoreaStakeholders of Organic Products in Mexico and Korea
Stakeholders of Organic Products in Mexico and Korea
Xanat V. Meza
 

Similar to Social Media for Lifestyle Health: Multimedia (20)

Univ 291 mercy housing lakefront final presentation!
Univ 291   mercy housing lakefront final presentation!Univ 291   mercy housing lakefront final presentation!
Univ 291 mercy housing lakefront final presentation!
 
Univ 291 -_mercy_housing_lakefront_final_presentation
Univ 291 -_mercy_housing_lakefront_final_presentationUniv 291 -_mercy_housing_lakefront_final_presentation
Univ 291 -_mercy_housing_lakefront_final_presentation
 
Social Media Research and Practice in the Health Domain - Tutorial, Part II
Social Media Research and Practice in the Health Domain - Tutorial, Part IISocial Media Research and Practice in the Health Domain - Tutorial, Part II
Social Media Research and Practice in the Health Domain - Tutorial, Part II
 
Digital Demography - WWW'17 Tutorial - Part II
Digital Demography - WWW'17 Tutorial - Part IIDigital Demography - WWW'17 Tutorial - Part II
Digital Demography - WWW'17 Tutorial - Part II
 
Sdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) finalSdal air health and social development (jan. 27, 2014) final
Sdal air health and social development (jan. 27, 2014) final
 
Needs assessment training cycle iv
Needs assessment training cycle ivNeeds assessment training cycle iv
Needs assessment training cycle iv
 
Combining Behaviors and Demographics to Segment Online Audiences:Experiments ...
Combining Behaviors and Demographics to Segment Online Audiences:Experiments ...Combining Behaviors and Demographics to Segment Online Audiences:Experiments ...
Combining Behaviors and Demographics to Segment Online Audiences:Experiments ...
 
Gap Calculator
Gap CalculatorGap Calculator
Gap Calculator
 
AVRC Community Based HIV and Aging
AVRC Community Based HIV and AgingAVRC Community Based HIV and Aging
AVRC Community Based HIV and Aging
 
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...
 
Foodprint UX Report
Foodprint UX ReportFoodprint UX Report
Foodprint UX Report
 
Logging in 3 communities - lightning talk festivIL 2021
Logging in 3 communities - lightning talk festivIL 2021Logging in 3 communities - lightning talk festivIL 2021
Logging in 3 communities - lightning talk festivIL 2021
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Demographic Targeting Insights
Demographic Targeting InsightsDemographic Targeting Insights
Demographic Targeting Insights
 
Health Datapalooza 2013: Datalab - Robert Post
Health Datapalooza 2013: Datalab - Robert PostHealth Datapalooza 2013: Datalab - Robert Post
Health Datapalooza 2013: Datalab - Robert Post
 
Colloque RI 2014 : Intervention de Ross C. BROWNSON (Washington University in...
Colloque RI 2014 : Intervention de Ross C. BROWNSON (Washington University in...Colloque RI 2014 : Intervention de Ross C. BROWNSON (Washington University in...
Colloque RI 2014 : Intervention de Ross C. BROWNSON (Washington University in...
 
Grant Writing and Reporting
Grant Writing and ReportingGrant Writing and Reporting
Grant Writing and Reporting
 
Vera Data 2.0 Slides and Script
Vera Data 2.0 Slides and ScriptVera Data 2.0 Slides and Script
Vera Data 2.0 Slides and Script
 
Stakeholders of Organic Products in Mexico and Korea
Stakeholders of Organic Products in Mexico and KoreaStakeholders of Organic Products in Mexico and Korea
Stakeholders of Organic Products in Mexico and Korea
 

More from Yelena Mejova

Modeling Human Values with Social Media
Modeling Human Values with Social MediaModeling Human Values with Social Media
Modeling Human Values with Social Media
Yelena Mejova
 
Capturing social media signals for health research
Capturing social media signals for health researchCapturing social media signals for health research
Capturing social media signals for health research
Yelena Mejova
 
Information Sources and Needs in the Obesity and Diabetes Twitter Discourse
Information Sources and Needs in the Obesity and DiabetesTwitter DiscourseInformation Sources and Needs in the Obesity and DiabetesTwitter Discourse
Information Sources and Needs in the Obesity and Diabetes Twitter Discourse
Yelena Mejova
 
Social Media and Tech for Health Research
Social Media and Tech for Health ResearchSocial Media and Tech for Health Research
Social Media and Tech for Health Research
Yelena Mejova
 
Social Medial for Health Research: Interventions
Social Medial for Health Research: InterventionsSocial Medial for Health Research: Interventions
Social Medial for Health Research: Interventions
Yelena Mejova
 
Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 Analysis
Yelena Mejova
 
Language of Politics on Twitter - 02 Twitter
Language of Politics on Twitter - 02 TwitterLanguage of Politics on Twitter - 02 Twitter
Language of Politics on Twitter - 02 Twitter
Yelena Mejova
 
Language of Politics on Twitter - 01 Political
Language of Politics on Twitter - 01 PoliticalLanguage of Politics on Twitter - 01 Political
Language of Politics on Twitter - 01 Political
Yelena Mejova
 
Religion on Social Media ICWSM 2015 Workshop Introduction
Religion on Social Media ICWSM 2015 Workshop IntroductionReligion on Social Media ICWSM 2015 Workshop Introduction
Religion on Social Media ICWSM 2015 Workshop Introduction
Yelena Mejova
 
Giving is Caring: Understanding Donation Behavior through Email
Giving is Caring: Understanding Donation Behavior through EmailGiving is Caring: Understanding Donation Behavior through Email
Giving is Caring: Understanding Donation Behavior through Email
Yelena Mejova
 

More from Yelena Mejova (10)

Modeling Human Values with Social Media
Modeling Human Values with Social MediaModeling Human Values with Social Media
Modeling Human Values with Social Media
 
Capturing social media signals for health research
Capturing social media signals for health researchCapturing social media signals for health research
Capturing social media signals for health research
 
Information Sources and Needs in the Obesity and Diabetes Twitter Discourse
Information Sources and Needs in the Obesity and DiabetesTwitter DiscourseInformation Sources and Needs in the Obesity and DiabetesTwitter Discourse
Information Sources and Needs in the Obesity and Diabetes Twitter Discourse
 
Social Media and Tech for Health Research
Social Media and Tech for Health ResearchSocial Media and Tech for Health Research
Social Media and Tech for Health Research
 
Social Medial for Health Research: Interventions
Social Medial for Health Research: InterventionsSocial Medial for Health Research: Interventions
Social Medial for Health Research: Interventions
 
Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 Analysis
 
Language of Politics on Twitter - 02 Twitter
Language of Politics on Twitter - 02 TwitterLanguage of Politics on Twitter - 02 Twitter
Language of Politics on Twitter - 02 Twitter
 
Language of Politics on Twitter - 01 Political
Language of Politics on Twitter - 01 PoliticalLanguage of Politics on Twitter - 01 Political
Language of Politics on Twitter - 01 Political
 
Religion on Social Media ICWSM 2015 Workshop Introduction
Religion on Social Media ICWSM 2015 Workshop IntroductionReligion on Social Media ICWSM 2015 Workshop Introduction
Religion on Social Media ICWSM 2015 Workshop Introduction
 
Giving is Caring: Understanding Donation Behavior through Email
Giving is Caring: Understanding Donation Behavior through EmailGiving is Caring: Understanding Donation Behavior through Email
Giving is Caring: Understanding Donation Behavior through Email
 

Recently uploaded

一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 

Recently uploaded (20)

一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 

Social Media for Lifestyle Health: Multimedia

  • 1. Social Media for Lifestyle Health Yelena Mejova Summer School Series on Methods for Computational Social Science July 23, 2019
  • 6. Social Media Lifestyle Health 1. What’s been done 2. Example pipeline
  • 7. Social Media Obesity You Tweet What You Eat: Studying Food Consumption Through Twitter Sofiane Abbar, Yelena Mejova, Ingmar Weber @ CHI'15
  • 8. data / topic refinement food lexicon streaming API geo-located users user histories Twitter food lexicon 1,2,3… Twitter food dataset
  • 9. quantification • estimating distribution of calories • not per-item precision • mean descriptors
  • 10. geo-location • GPS tagging of tweets (<5%) • User location strings (up to 65%) • Locations mentioned in tweets (unreliable)
  • 11. population levelaverage caloric value of all foods mentioned in tweet using exact keyword matching vs obesity @ Center of Disease Control . food keyword frequency vs obesity r = 0.56*** final model: demographics + detected foods
  • 12. • Education & Income: – US Census at ZIP code level • Gender: – genderize.io 37% f / 32% m – neither: – 26.7% not human, excluded
  • 13. crowdsourcing • Got a complicated tasks computational tools cannot handle (well enough)? • Break it into small pieces and have random people on internet do it • Cheap per task • Several labels per task for quality assessment • Are people any good at your task? (upper bound)
  • 15. individual level • #fat*problems self-disclosure of weight (possibly) • High-precision keyword detection of interests/conditions in user profiles change in obesity rate
  • 16. > individual level • WeFollow prominent user lists, followership as proxy for interest top 15 factors by magnitude of coefficient
  • 17. network level • Friendship & Mention networks • Social activation: users above 90th percentile in terms of obesity and/or diabetes score (personalized using Ridge regression on foods a user has mentioned) • Threshold model: success of a social diffusion process depends on reaching a certain critical number of adopters • Activation probability given x of your neighbors are activated users
  • 18. Friendship Network Mention Network network level • Content spread: remove replies & retweets • Geography: remove links from same state
  • 19. Social Media Food Deserts Characterizing dietary choices, nutrition, and language in food deserts via social media De Choudhury, Sharma, Kiciman @ CSCW'16
  • 20. Food desert: – Low-income census tracts with a substantial number or share of residents with low levels of access to retail outlets selling healthy and affordable foods • an estimated 13.5 million people in the United States have low access to a supermarket or large grocery store, with 82 percent living in urban areas. https://www.ers.usda.gov/data-products/food-access-research-atlas/documentation/
  • 22. data collection canonical food lexicon geo-located posts census tracts 14 million posts 8 million users July 2013 - March 2015 35.5% posts with geo-location tags
  • 23. geo-location 2 • knowing long/lat of content • obtain mapping to US Census tract
  • 24. • need to control for confounding variables when comparing to a control set controltreatment
  • 25. Socioeconomic Variables population % minority population #households #families % non-Hispanic whites median house age median family income owner occupied housing units distressed/underserved tract Similarity via Mahalanobis distance Selection via k Nearest Neighbors Discard FD tracts without good enough matches matching
  • 27. • differences in nutrition between FD and matched NFD can be big • but they also vary across the region Statistical significance in nutritional attributes of FDs and NFDs, with Bonferroni adjustment of α
  • 28. predicting whether a tract is a food desert • S = socioeconomic • F = food deprivation • T = topic distribution classification Instagram topic information helps!
  • 29. error analysis • social media helps identify recent developments like gentrification Atlanta, Georgia Popular topics: “smoothie”, “organic”, “farmtotable”, “baking”
  • 30. Social Media Images Disclosure Is #Saki Delicious?: The Food Perception Gap on Instagram and Its Relation to Health Ofli, Aytar, Weber, al Hammouri, Torralba @ WWW'17
  • 31. • most studies take hashtags as true description of image content, but are they?
  • 32. data collection • query: #food, #foodporn, #foodie, #breakfast, #lunch, #dinner • 72M images, 26M with location • 4M assigned to US county • 3.7M on hashtags canonical food lexicon food-related posts geo-localized food-related posts 10,000 hashtags as food-related
  • 33. geo-location 3 • Geo-location: – county shape files from US Census • https://www.census.gov/programs- surveys/geography.html – shapely python library • https://github.com/Toblerity/Shapely
  • 34. image labeling • Deep residual network – “can train substantially deeper models” – He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770-778). – In the training procedure, the final 1000-way softmax in the deep residual model is re- placed with a 101- way softmax, and the model is fine-tuned on the Insta- 101 and Food-101 datasets individually
  • 35. image labeling L. Bossard, M. Guillaumin, and L. Van Gool. Food-101–mining discriminative components with random forests. In European Conference on Computer Vision, 2014. 101 food categories, 101,000 images (750 train / 250 test) Instagram images matched manually to Food-101 categories (4000 train / 250 test, no manual cleaning)
  • 36. perception gap • difference in how a machine and a human annotate a given image • then aggregated for user (no one user contributes too much), then per county
  • 37. perception gap machine is more likely than human to label post #instabeer in places where there is higher food insecurity machine is less likely than human to label post #sushiroll in places where there is higher food insecurity machine is less likely than human to label post #chicagopizza in places where there is higher alcohol-related driving deaths
  • 38. variation in subjective labels • for a subjective user tag j, and each machine tag i, compute P(j|i) • computed first for a user, then aggregated per county • focus on #healthy, #delicious, #organic
  • 41. • Social media can be a source of cheap training data • Images provide alternative “view” to hashtags observations
  • 42. Social Media Images Health Social Media Image Analysis for Public Health Garimella, Alfayad, Weber@ CHI’16
  • 43. data collection • geo-located Instagram posts at restaurants • mapped to US counties using Federal Communication Commission API • top 100 counties by image count used • 2,000 images randomly selected for each county
  • 44. Imagga • returns tags with a confidence score (use tags with at least 20% confidence) • tags appearing in less than 10 counties are ignored • free account limit 1 image per second try it! https://imagga.com/auto-tagging-demo
  • 45. predicting public health Ridge regression with smoothing parameter α=0.1 U = user-provided tags, I = machine-generated tags, D = demographics Showing correlation between predicted health statistic and known Statistical significance of demography-only baseline
  • 46. predicting public health imagga tags user tags + demog user tags
  • 47. predicting public health Physically Inactive top distinguishing features
  • 48. Social Media Images Mental Health What Twitter Profile and Posted Images Reveal about Depression and Anxiety Guntuku, Preotiuc-Pietro, Eichstaedt, Ungar @ ICWSM’19
  • 49. data collection • surveys administered on platform Qualtrics • demography, Beck’s Depression Inventory • informed consent • Twitter handles • 560 people with 20 or more images posted • + Facebook dataset, text only highdepressionlowdepression
  • 50. image description • Hue – Saturation – Value (HSV) • Hue count (professional photos have fewer hues) • 6-bin and 12-bin color histograms • Warm & cold colors • Aesthetics: deep CNN that produces labels such as object emphasis, rule of thirds, symmetry, motion blur, vivid color… • Content: Imagga tags (top 10) clustered via Normalized Pointwise Mutual Information (NPMI) • Content: VGG-Net image classifier for 1,000 objects • Face features via Face++ and EmoVu APIs (emotions, smiling)
  • 51. Pearson correlations between color and aesthetic features extracted from images and mental health conditions, and with age and gender (coded as 1 for female, 0 for male) separately. Correlations for mental health conditions are controlled for age, gender and other mental health condition. Only significant correlations (p < .01, two-tailed t-test) are presented.
  • 52. “while depressed users preferred images which are not sharp and which do not show face, anxious users usually chose sharper images with multiple people in them”
  • 53. Pearson correlation between Imagga tag clusters and mental health conditions
  • 54. mental health classification Using visual features of posted images in predicting mental health conditions. Using linear regression with ElasticNet regularization. Performance measured via Pearson correlation (MSE in brackets). ST – single task, MT – multi-task Additional training on text-based features (users labeled for age and gender) boosts performance up to r = .167 for depression and r = .223 for anxiety.
  • 55. observations • easier to model populations than individuals • easier to model behaviors than mental states • text often describes the images (hashtags) • getting ground truth can be difficult
  • 56. Could you do it better? Can people guess a person’s BMI better from a picture than a machine? Face-to-BMI: Using Computer Vision to Infer Body Mass Index on Social Media Garimella, Kocabey, Camurcu, Ofli, Aytar, Marin, Torralba, Weber @ ICWSM’17
  • 57. Male 30 years old 6 feet 5 inches tall Starting weight 385lb Current weight 310lb Goal weight: 225lb
  • 58. data collection • Cropping images with known gender, height, and weight from Reddit • 4206 faces with BMI = body mass in kg / (body height in m)2
  • 59. face-to-BMI modeling • Use pre-trained deep learning features: – general object classification (VGG-Net) – face recognition task (VGG-Face) • Then train on the BMI dataset, test on held-out set
  • 60. crowdsourcing BMI • simpler than guessing BMI: comparing BMI of two pictures (M-M, F-F, M-F) • use Amazon Mechanical Turk, 3 labels per task • compared to machine, human performance differs by 2%
  • 61. bias? • Algorithm could be learning existing stereotypes (ex: African Americans tend to have higher obesity rates in US) • Try balanced set of 2000 male-female pairs, 1037 chosen higher BMI for females (p = 0.05) • Try balanced set of 2000 White-African American pairs, 1085 chosen higher BMI for White (p = 0.05)
  • 62. Bonus: recipe + image Learning Cross-modal Embeddings for Cooking Recipes and Food Images Salvador, Hynes, Aytar, Marin, Ofli, Weber, Torralba @ CVPR'17
  • 63. • combining recipe information (ingredients + instructions) with images • crawled recipe websites, standardized the information: Recipe1M dataset
  • 64. • use LSTMs for modeling ingredients and two-stage LSTMs for modeling cooking instructions, deep convolutional networks for image representation • Semantic Regularization learns mapping between images and food categories
  • 69. observations • people annotate their photos in some settings • cannot see some ingredients but may find associations (coke -> sugar?)
  • 71. • Getting social media data – Ex: Twitter stream listener • Geolocating users – Ex: CLAVIN • Linking to health statistics & census – Ex: County Health Rankings & IStat • Labeling images – Ex: Imagga • Linking to nutritional information – Ex: Nutrition lexicon from recipe • Relating diet to obesity
  • 73. Twitter Streaming Pipeline • Need to think of keywords at beginning of project • But get a lot of data over time • Make sure job is always running, restart if necessary, store data in small chunks Twitter Streaming API client Twitter Job Watcher crontab Daily dumps Log
  • 75. is the job still running? JobWatcher.pl what is the last file saved? get today’s formatted date output status if it’s not running, start the job if it’s running but it’s next day, restart it process yesterday’s data
  • 77.
  • 79.
  • 80. using non-streaming APIs • careful with rate limits • restrictions on amount of data (back in time, number of items) • deleted content or accounts are not there • get historical interactions with content (likes)
  • 82. geo-location: CLAVIN • https://github.com/Berico-Technologies/CLAVIN • uses allCountries.zip gazetteer file from GeoNames.org
  • 83. The Somali Ministry of Information, Posts and Telecommunications started the process of distributing 6,000 hand-held radios to Internally Displaced Persons (IDPs) in Mogadishu. In the first batch, the Ministry handed out 1,000 radios at Badbado camp, Somalia's largest IDP camp. The radios were received by to those most in need: namely, female-headed households, elderly and youth groups. The beneficiaries will receive news and important information concerning relief efforts and public safety messages daily. The small emergency radios are both solar-powered and hand-cranked and can also operate with batteries. The radios can be tuned in to multiple frequencies. The Deputy Minister of Information, Posts and Telecommunications, H.E. Abdullahi Bile Nur, who witnessed the distribution process at Badbado camp, said: "In any emergency, the first priority is the delivery of critical aid, but communities need more than that. They also need information. It is important for them to know where they can get water, where they can get certain facilities, how to access those facilities." "We believe the radios will make a difference in terms of morale and education;" he added. Radio Mogadishu broadcasts a daily show named 'Recovery' (previously 'Help') that is packaged along with the latest announcements and information from humanitarian agencies. The program offers guidance on hygiene and sanitation, nutrition, child education, good neighborhood, becoming productive members of the society, among other key topics. Somalia’s Prime Minister Dr. Abdiweli Mohamed Ali received the European Union envoy to Somalia Alexander Rondes in his office in Mogadishu today. The envoy was accompanied by EU officials and others while Somali minister for defence Hussein Arab Isse and minister for foreign affairs were also present in the meeting. The premier warmly received the EU envoy and thanked him for visiting Mogadishu. He requested the EU to double its efforts in the restoration of peace and stability in Somalia. The two leaders discussed the strategic plans of setting up control and authority in the areas reclaimed from Al-shabab in order to deliver the much needed public services and humanitarian aid to the people. The meeting by the premier and the EU envoy also highlighted the upcoming London meeting which aims at delivering a new international approach to Somalia. The premier stated that the prevalent security made by the government provides an opportune time for consolidation of such gains. The special envoy, who was appointed by EU to represent the horn of Africa region, stated that he will give priority to Somalia. It seems the international community is making concerted effort to the Somalia issue after the government made important strides in security, the new constitution, restructuring the parliament, local administrations’ cooperation and good governance. Resolved "European Union" as: "European Union" {European Union (No Man's Land, ) [pop: 0] <6695072>}, position: 1590, confidence: 1.000000, fuzzy: false [50.83857,4.37599] Resolved "London" as: "london" {com.bericotech.clavin.gazetteer.LazyAncestryGeoName@285818}, position: 2306, confidence: 1.000000, fuzzy: false [51.50853,-0.12574] Resolved "Africa" as: "Africa" {Africa (No Man's Land, ) [pop: 1031833000] <6255146>}, position: 2586, confidence: 1.000000, fuzzy: false [7.1881,21.09375] Resolved "Muna" as: "Muna" {com.bericotech.clavin.gazetteer.LazyAncestryGeoName@35c23f}, position: 2957, confidence: 1.000000, fuzzy: false [20.48794,-89.71387] Resolved "Mataban" as: "Mataban" {com.bericotech.clavin.gazetteer.LazyAncestryGeoName@d460}, position: 4161, confidence: 1.000000, fuzzy: false [5.20401,45.53353] Resolved "Birmingham" as: "Birmingham" {com.bericotech.clavin.gazetteer.LazyAncestryGeoName@28866c}, position: 4841, confidence: 1.000000, fuzzy: false [52.48142,-1.89983] Resolved "Hiran" as: "Hiran" {com.bericotech.clavin.gazetteer.LazyAncestryGeoName@7b1830}, position: 5208, confidence: 1.000000, fuzzy: false [14.44998,45.57068] Resolved "London" as: "london" {com.bericotech.clavin.gazetteer.LazyAncestryGeoName@285818}, position: 6129, confidence: 1.000000, fuzzy: false [51.50853,-0.12574] Resolved "East Africa" as: "Portuguese East Africa" {Mozambique (Mozambique, 00) [pop: 22061451] <1036973>}, position: 7589, confidence: 1.000000, fuzzy: false [-18.25,35.0] In: Out: Check Quality of Matching for Most Frequent Terms!
  • 84. GPS to FIPS (USA) often need smaller administrative unit ID
  • 86. Linking to health statistics
  • 87. Linking to health statistics
  • 88. Linking to health statistics
  • 89. Linking to personal health • Crowdsourcing • Personal health, demographics, beliefs… • Experimentation • IRB approval!
  • 93.
  • 94.
  • 98. Face Analyze curl -X POST "https://api-us.faceplusplus.com/facepp/v3/face/analyze" -F "api_key=Cc9Pnq-G01FVr5FO1unI-CtQMcPVl9YM" -F "api_secret=VwWy_pcnyOdvP0I-0GBbBiaop1b-TjHD" -F "return_landmark=1" -F "return_attributes=gender,age,smiling,emotion,ethnicity,beauty,mouthstatus,skinstatus" -F "face_tokens=0221733fc86b65b9188e26f6c7b85c56"
  • 100. USDA • Huge list • Includes producer • Too specific • Repetitive • Formal language
  • 101. Recipes • Also huge • Human language • No nutrient info • Social info
  • 102. Recipes + Ingredients + Nutrients • Remove or flag popular ambiguous terms: honey, apple, salsa, … • Remember: food density, not portions!
  • 105. Fetishizing food in digital age: #Foodporn around the world. Mejova, Abbar, Haddadi @ ICWSM'16

Editor's Notes

  1. We calculate the log likelihood ratios (LLR) of each of the canonical names. It is given as the natural logarithm of the ratio between their normalized frequency of occurrence in each food desert of a region, and that in the matching non-food deserts corresponding to each food desert.
  2. Evaluate on ImageNet, getting 3.5% error rate. This result won the 1st place on the ILSVRC 2015 classification task
  3. Fig: Mean cross-dataset performance of Insta-101 model trained with increasing number of training samples per category. Note that around 2500 samples per category Insta-101 model reaches the performance of Food-101 model. In the training procedure, the final 1000-way softmax in the deep residual model is replaced with a 101-way softmax, and the model is fine-tuned on the Insta-101 and Food-101 datasets individually
  4. “D: AA,H” indicates demographic features pertaining to African Americans and Hispanics
  5. Use averaged vectors of recipes, and perform “geometric transformations in the learned space”
  6. https://console.faceplusplus.com/service/face/intro
  7. https://console.faceplusplus.com/documents/6329465