Computational Social Science Summer School on Methods, special issue on Multimedia. This course describes some latest work on using media found on Twitter, Instagram, and other websites to track obesity, diabetes, and mental health.
You Tweet What You Eat: Studying Food Consumption Through Twitter 유혜수Hyesoo Yoo
- Researchers analyzed over 50 million tweets from 210,000 US Twitter users to study Americans' food consumption habits by linking mentioned foods to calorie amounts, demographics, and social networks. They found correlations between calorie levels of mentioned foods and state-level obesity rates. The study demonstrated how social media can provide insights into public health issues like diet and obesity on a large scale.
Turning Data into Infographics: An Interactive Workshop for Problem SolversUNCResearchHub
This document provides an overview of creating infographics from data. It discusses finding relevant data from government, commercial, think tank and hybrid sources. It also covers best practices for exploring data to find patterns and stories, visualizing data in infographics, and critiquing infographics. The workshop teaches how to plan infographics based on data about food insecurity in the US and sketch an example infographic on this topic. Resources for creating and finding inspiration for infographics are also listed.
The data and Information Literacy of runners: quantifying diet and activityPamela McKinney
Presentation for the European Conference on Information Literacy, 24-27th September 2018, Oulu Finland. Reports on a quantitative study that investigated the health, diet and fitness tracking behaviours of members of the Parkrun organisation in the UK
Conducted Poisson e count regression model assessing the relationship between racial composition and three food store varieties: healthy, unhealthy and provisional locations. I use ArcGIS to count the number of food locations per census tract and then regressed the numbers with the racial composition of the neighborhoods in Milwaukee.
This document provides information about data sources and tools that can be used for community assessments related to aging. It lists numerous federal and local government data sources that provide statistics on demographics, housing, transportation, health, employment and other topics. It also summarizes several community assessment tools and indexes that use indicators to evaluate factors like livability, sustainability, and successful aging for communities. These tools involve surveys, interviews, and observations and measure areas such as basic needs, social engagement, health, transportation and housing.
Univ 291 mercy housing lakefront final presentationlapinsklauren
The document summarizes a research project between Loyola University students and Mercy Housing Lakefront's Tenant Leadership Committee. It provides background on Mercy Housing Lakefront, context for the research including proposed budget cuts, research questions about food resources and how budget cuts have impacted tenants, a description of data collection including a survey and map of food resources, and results and findings from analyzing the quantitative and qualitative data.
Logging in 3 communities ECIL conference 2021Pamela McKinney
Presentation developed with Andrew Cox and Laura Sbaffi to summarise our quantitative research into Food and activity tracking in 3 communities of participants - people who run for leisure with Parkrun, people with type 2 diabetes who are members of the Diabetes.co.uk online community, and members of the IBS Network charity.
You Tweet What You Eat: Studying Food Consumption Through Twitter 유혜수Hyesoo Yoo
- Researchers analyzed over 50 million tweets from 210,000 US Twitter users to study Americans' food consumption habits by linking mentioned foods to calorie amounts, demographics, and social networks. They found correlations between calorie levels of mentioned foods and state-level obesity rates. The study demonstrated how social media can provide insights into public health issues like diet and obesity on a large scale.
Turning Data into Infographics: An Interactive Workshop for Problem SolversUNCResearchHub
This document provides an overview of creating infographics from data. It discusses finding relevant data from government, commercial, think tank and hybrid sources. It also covers best practices for exploring data to find patterns and stories, visualizing data in infographics, and critiquing infographics. The workshop teaches how to plan infographics based on data about food insecurity in the US and sketch an example infographic on this topic. Resources for creating and finding inspiration for infographics are also listed.
The data and Information Literacy of runners: quantifying diet and activityPamela McKinney
Presentation for the European Conference on Information Literacy, 24-27th September 2018, Oulu Finland. Reports on a quantitative study that investigated the health, diet and fitness tracking behaviours of members of the Parkrun organisation in the UK
Conducted Poisson e count regression model assessing the relationship between racial composition and three food store varieties: healthy, unhealthy and provisional locations. I use ArcGIS to count the number of food locations per census tract and then regressed the numbers with the racial composition of the neighborhoods in Milwaukee.
This document provides information about data sources and tools that can be used for community assessments related to aging. It lists numerous federal and local government data sources that provide statistics on demographics, housing, transportation, health, employment and other topics. It also summarizes several community assessment tools and indexes that use indicators to evaluate factors like livability, sustainability, and successful aging for communities. These tools involve surveys, interviews, and observations and measure areas such as basic needs, social engagement, health, transportation and housing.
Univ 291 mercy housing lakefront final presentationlapinsklauren
The document summarizes a research project between Loyola University students and Mercy Housing Lakefront's Tenant Leadership Committee. It provides background on Mercy Housing Lakefront, context for the research including proposed budget cuts, research questions about food resources and how budget cuts have impacted tenants, a description of data collection including a survey and map of food resources, and results and findings from analyzing the quantitative and qualitative data.
Logging in 3 communities ECIL conference 2021Pamela McKinney
Presentation developed with Andrew Cox and Laura Sbaffi to summarise our quantitative research into Food and activity tracking in 3 communities of participants - people who run for leisure with Parkrun, people with type 2 diabetes who are members of the Diabetes.co.uk online community, and members of the IBS Network charity.
Univ 291 mercy housing lakefront final presentation!msullivan4
The document summarizes a research project between Loyola University students and Mercy Housing Lakefront's Tenant Leadership Committee. It provides background on Mercy Housing Lakefront, context for the research including proposed budget cuts, research questions about food resources and how budget changes have impacted tenants, a description of data collection including a survey and map of food resources, and results and findings from analyzing the quantitative and qualitative data.
The document summarizes a research project between Loyola University students and Mercy Housing Lakefront's Tenant Leadership Committee. It provides background on Mercy Housing Lakefront, context for the research including proposed budget cuts, research questions about food resources and how budget changes have impacted tenants, and an overview of the data collection process including a survey of 50 tenants and mapping of local food resources.
Social Media Research and Practice in the Health Domain - Tutorial, Part IIIngmar Weber
This document discusses social media research in the health domain and presents three case studies on using social media data for health-related observational studies. It addresses some key data and technical challenges, including issues of representativeness, truthfulness, and data quality. Validation techniques discussed include comparing findings to population health statistics, online surveys, sensor data, and medical records. The document also provides an overview of common data sources for health research like Twitter, Reddit, and Facebook advertising estimates. It describes basic and advanced analytical methods like social network analysis, matching methods, and different types of regression to model observational data.
Digital Demography - WWW'17 Tutorial - Part IIIngmar Weber
The document provides an agenda for a digital demography workshop covering traditional demography models and new opportunities using digital data sources. The workshop is split into two parts, with the morning covering standard demography topics and the afternoon focusing on 16 case studies applying digital data to questions of fertility, mortality, and migration. The case studies demonstrate using sources like search engine queries, online genealogy records, social media posts, and email metadata to study demography trends. The document also introduces the two workshop presenters and their backgrounds in computational sociology and demographic research.
Sdal air health and social development (jan. 27, 2014) finalkimlyman
This document summarizes a workshop on health and social development analytics using big data. It discusses how data sources are becoming larger, more diverse and used for multiple purposes. This presents opportunities to better understand issues but also challenges around privacy, bias and data quality. The workshop aims to identify partnership opportunities and prototype projects using integrated data to address health and social issues. Case studies from various institutions are presented using combined data sources like medical records, surveys and environmental factors.
Needs assessment training for Cycle IV of the "Identifying our Needs: A Survey of Elders" needs assessment - for participating tribes, Title VI, and int
Combining Behaviors and Demographics to Segment Online Audiences:Experiments ...Joni Salminen
Link to article: https://www.springerprofessional.de/en/combining-behaviors-and-demographics-to-segment-online-audiences/16204306
CITE: Jansen, Bernard J., Jung, S., Salminen, J., An, J. and Kwak, H. (2018), “Combining Behaviors and Demographics to Segment Online Audiences: Experiments with a YouTube Channel”, Proceedings of the 5th International Conference of Internet Science (INSCI 2018), Springer, St. Petersburg, Russia.
Link to Automatic Persona Generation: https://persona.qcri.org
The document proposes a gap calculator tool to quantify and address hunger in the United States. The tool would combine data on existing federal, state, local and charitable food resources and compare it to estimates of food need. This would calculate a "gap" nationally, by state, county and congressional district. The results would be displayed graphically and show how viewers can help close gaps through advocacy, donations and volunteering. The goal is to educate the public on hunger and empower them to find local solutions.
This document summarizes a proposed community-based research study on older adults living with HIV (OALWH) in San Diego. It discusses:
1) Engaging OALWH and community members to provide input and participate in HIV and aging research through a steering committee called "Platinum".
2) Developing a "village model" using a social app to help OALWH age in place by meeting their social and support needs within their community.
3) Conducting needs assessments, focus groups, and a pilot study using a social app to facilitate creation of a village for OALWH in central San Diego aimed at reducing isolation and improving access to services.
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...GUANGYUAN PIAO
This document summarizes a study that leverages followee biographies on Twitter to infer interests for passive users. The study extracted entities from followee bios and names, finding bios provided over twice as many entities on average. It then used these entities to build user interest profiles and test different modeling strategies. Strategies that used followee bios outperformed those using only names, with interest propagation through DBpedia performing best. The authors conclude leveraging followee bios can provide more informative user profiles for improving recommendation performance.
The document outlines a proposal for a mobile app called Foodprint that calculates a user's carbon footprint based on their food selections. It would suggest healthier and lower-carbon alternatives and locate local farmers markets and CSAs. User research found interest in making informed, sustainable food choices but lack of time and information hinders this. The proposal describes app features, personas, prototypes, and plans to drive membership for partner organizations. Testing led to improvements making key actions more prominent. Next steps include social features to increase engagement and links to revenue.
Logging in 3 communities - lightning talk festivIL 2021Pamela McKinney
Lightning talk (5 minute) presentation given at the online FestivIL conference, June 2021 about research into the information literacy of food and activity tracking in three communities, parkrunners, people with type 2 diabetes, and people with Irritable Bowel Syndrome.
This paper looks at the problem of privacy in the context
of Online Social Networks (OSNs). In particular, it examines the predictability of different types of personal information based on OSN data and compares it to the perceptions of users about the disclosure of their information. To this end, a real life dataset is composed. This consists of the Facebook data (images, posts and likes) of 170 people along with
their replies to a survey that addresses both their personal information, as well as their perceptions about the sensitivity and the predictability of different types of information. Importantly, we evaluate several learning techniques for the prediction of user attributes based on their OSN data. Our analysis shows that the perceptions of users with respect to
the disclosure of specific types of information are often incorrect. For instance, it appears that the predictability of their political beliefs and employment status is higher than they tend to believe. Interestingly, it also appears that information that is characterized by users as more sensitive, is actually more easily predictable than users think, and vice versa (i.e. information that is characterized as relatively less sensitive is less easily predictable than users might have thought).
Representation for conducting surveys aimed at defined demographic audiences for collecting market insights and intelligence for strategic campaign planning.
Health Datapalooza IV: June 3rd-4th, 2013
Datalab
Moderator:
Todd Park, Chief Technology Officer, United States
Damon Davis, Health Data Initiative Program Director, Department of Health and Human Services
Speakers:
Susan Queen, Director, Division of Data Policy, Office of the Assistant Secretary for Planning and Evaluation
Steve Cohen, Director, Center for Financing, Access and Cost Trends, Agency for Healthcare Research & Quality
Rick Moser, National Institutes of Health
Victor Lazzaro, Performance & Data Analytics Manager, Office of the National Coordinator for Health IT
Niall Brennan, Director of the Office of Information Products and Data Analytics, Center for Medicare and Medicaid Services
Miya Cain, Office of the Assistant Secretary, Administration for Children and Families, US Department of Health and Human Services
Edward Salsberg, Director, National Center for Health Workforce Analysis, Health Resources and Services Administration
Robert Post, Environmental Protection Agency (EPA)
Eugene Hayes, the Substance Abuse and Mental Health Services Administration (SAMHSA)
Jim Craver, Center for Disease Control and Prevention (CDC)
David Forrest, Senior Advisor, Health and Human Services Office of the Chief Technology Officer
Tania Allard, Director of Intergovernmental Affairs & Special Projects, New York State Department of Health
Steven Edwards, Environmental Protection Agency
Steve Emrick, National Library of Medicine
Carol A. Gotway Crawford, Director of Behavioral Surveillance, Centers for Disease Control
This perennial favorite breakout session is back! This is the best opportunity to meet some of the federal government data experts who champion action in improving public access to information to catalyze innovation. Come learn how to use assets from the Department of Health & Human Services (HHS), the Department of Agriculture (USDA), the Environmental Protection Agency (EPA) and more. Each agency in the federal government is staffed by experts who are well versed in the information resources available from their division on data.gov (administrative data, survey data, research data, medical/scientific content, etc.) The Datalab will also feature opportunities for one-on-one meet-ups with data experts for “deep dives” into agency’s resources. Participants can join live demonstrations and check out new data resources and tools. The goal of the session is to give innovators and entrepreneurs an overview of new, updated, and emerging datasets that can be used to support new applications and services.
Emerging issues in understanding evidence from complex, public health interventions
Présentation de Ross C. Brwnson au colloque "Recherche interventionnelle contre le cancer : Réunir chercheurs, décideurs et acteurs de terrain » - 17 et 18 novembre 2014, BnF, Paris
This webinar will demonstrate how to use HealthyCity.org to enhance your grant proposals and reports with visually impactful and relevant data and maps. Learn how to access data highlighting the needs and opportunities within your communities and how to make the case that your program will make a difference.
This presentation analyzed data from 200,000 Causes users to understand user types. Quantitative methods like k-means clustering and decision trees identified 6 personas. Surveys and interviews with 1,466 and 65 users provided qualitative data. The results identified six main user types of Causes: Casual Activist, Self-Assured Millennial, Practical Activist, Ambitious Activist, Organized Retiree, and Tenacious Veteran Activist. Future work will engage users through tailored approaches based on their needs and identify ideal user types.
Stakeholders of Organic Products in Mexico and KoreaXanat V. Meza
The document summarizes a study that analyzed stakeholders of organic products in Mexico and South Korea using Twitter data. Key findings include:
1) The Twitter network for organic products was larger in South Korea than Mexico in terms of nodes and edges. Both networks increased in size over time.
2) Influential players in Mexico included media outlets and suppliers, while in South Korea included "others" and suppliers.
3) Stakeholder types like consumers, suppliers, media, and government played a role in diffusing information about organic products on Twitter networks in both countries.
Modeling Human Values with Social MediaYelena Mejova
IC2S2 2019 Tutorial by Kyriaki Kalimeri and Yelena Mejova. Overview of theories on values and examples of studies that track values using social media in domains of politics, religion, and nutritional health.
Univ 291 mercy housing lakefront final presentation!msullivan4
The document summarizes a research project between Loyola University students and Mercy Housing Lakefront's Tenant Leadership Committee. It provides background on Mercy Housing Lakefront, context for the research including proposed budget cuts, research questions about food resources and how budget changes have impacted tenants, a description of data collection including a survey and map of food resources, and results and findings from analyzing the quantitative and qualitative data.
The document summarizes a research project between Loyola University students and Mercy Housing Lakefront's Tenant Leadership Committee. It provides background on Mercy Housing Lakefront, context for the research including proposed budget cuts, research questions about food resources and how budget changes have impacted tenants, and an overview of the data collection process including a survey of 50 tenants and mapping of local food resources.
Social Media Research and Practice in the Health Domain - Tutorial, Part IIIngmar Weber
This document discusses social media research in the health domain and presents three case studies on using social media data for health-related observational studies. It addresses some key data and technical challenges, including issues of representativeness, truthfulness, and data quality. Validation techniques discussed include comparing findings to population health statistics, online surveys, sensor data, and medical records. The document also provides an overview of common data sources for health research like Twitter, Reddit, and Facebook advertising estimates. It describes basic and advanced analytical methods like social network analysis, matching methods, and different types of regression to model observational data.
Digital Demography - WWW'17 Tutorial - Part IIIngmar Weber
The document provides an agenda for a digital demography workshop covering traditional demography models and new opportunities using digital data sources. The workshop is split into two parts, with the morning covering standard demography topics and the afternoon focusing on 16 case studies applying digital data to questions of fertility, mortality, and migration. The case studies demonstrate using sources like search engine queries, online genealogy records, social media posts, and email metadata to study demography trends. The document also introduces the two workshop presenters and their backgrounds in computational sociology and demographic research.
Sdal air health and social development (jan. 27, 2014) finalkimlyman
This document summarizes a workshop on health and social development analytics using big data. It discusses how data sources are becoming larger, more diverse and used for multiple purposes. This presents opportunities to better understand issues but also challenges around privacy, bias and data quality. The workshop aims to identify partnership opportunities and prototype projects using integrated data to address health and social issues. Case studies from various institutions are presented using combined data sources like medical records, surveys and environmental factors.
Needs assessment training for Cycle IV of the "Identifying our Needs: A Survey of Elders" needs assessment - for participating tribes, Title VI, and int
Combining Behaviors and Demographics to Segment Online Audiences:Experiments ...Joni Salminen
Link to article: https://www.springerprofessional.de/en/combining-behaviors-and-demographics-to-segment-online-audiences/16204306
CITE: Jansen, Bernard J., Jung, S., Salminen, J., An, J. and Kwak, H. (2018), “Combining Behaviors and Demographics to Segment Online Audiences: Experiments with a YouTube Channel”, Proceedings of the 5th International Conference of Internet Science (INSCI 2018), Springer, St. Petersburg, Russia.
Link to Automatic Persona Generation: https://persona.qcri.org
The document proposes a gap calculator tool to quantify and address hunger in the United States. The tool would combine data on existing federal, state, local and charitable food resources and compare it to estimates of food need. This would calculate a "gap" nationally, by state, county and congressional district. The results would be displayed graphically and show how viewers can help close gaps through advocacy, donations and volunteering. The goal is to educate the public on hunger and empower them to find local solutions.
This document summarizes a proposed community-based research study on older adults living with HIV (OALWH) in San Diego. It discusses:
1) Engaging OALWH and community members to provide input and participate in HIV and aging research through a steering committee called "Platinum".
2) Developing a "village model" using a social app to help OALWH age in place by meeting their social and support needs within their community.
3) Conducting needs assessments, focus groups, and a pilot study using a social app to facilitate creation of a village for OALWH in central San Diego aimed at reducing isolation and improving access to services.
ECIR2017-Inferring User Interests for Passive Users on Twitter by Leveraging ...GUANGYUAN PIAO
This document summarizes a study that leverages followee biographies on Twitter to infer interests for passive users. The study extracted entities from followee bios and names, finding bios provided over twice as many entities on average. It then used these entities to build user interest profiles and test different modeling strategies. Strategies that used followee bios outperformed those using only names, with interest propagation through DBpedia performing best. The authors conclude leveraging followee bios can provide more informative user profiles for improving recommendation performance.
The document outlines a proposal for a mobile app called Foodprint that calculates a user's carbon footprint based on their food selections. It would suggest healthier and lower-carbon alternatives and locate local farmers markets and CSAs. User research found interest in making informed, sustainable food choices but lack of time and information hinders this. The proposal describes app features, personas, prototypes, and plans to drive membership for partner organizations. Testing led to improvements making key actions more prominent. Next steps include social features to increase engagement and links to revenue.
Logging in 3 communities - lightning talk festivIL 2021Pamela McKinney
Lightning talk (5 minute) presentation given at the online FestivIL conference, June 2021 about research into the information literacy of food and activity tracking in three communities, parkrunners, people with type 2 diabetes, and people with Irritable Bowel Syndrome.
This paper looks at the problem of privacy in the context
of Online Social Networks (OSNs). In particular, it examines the predictability of different types of personal information based on OSN data and compares it to the perceptions of users about the disclosure of their information. To this end, a real life dataset is composed. This consists of the Facebook data (images, posts and likes) of 170 people along with
their replies to a survey that addresses both their personal information, as well as their perceptions about the sensitivity and the predictability of different types of information. Importantly, we evaluate several learning techniques for the prediction of user attributes based on their OSN data. Our analysis shows that the perceptions of users with respect to
the disclosure of specific types of information are often incorrect. For instance, it appears that the predictability of their political beliefs and employment status is higher than they tend to believe. Interestingly, it also appears that information that is characterized by users as more sensitive, is actually more easily predictable than users think, and vice versa (i.e. information that is characterized as relatively less sensitive is less easily predictable than users might have thought).
Representation for conducting surveys aimed at defined demographic audiences for collecting market insights and intelligence for strategic campaign planning.
Health Datapalooza IV: June 3rd-4th, 2013
Datalab
Moderator:
Todd Park, Chief Technology Officer, United States
Damon Davis, Health Data Initiative Program Director, Department of Health and Human Services
Speakers:
Susan Queen, Director, Division of Data Policy, Office of the Assistant Secretary for Planning and Evaluation
Steve Cohen, Director, Center for Financing, Access and Cost Trends, Agency for Healthcare Research & Quality
Rick Moser, National Institutes of Health
Victor Lazzaro, Performance & Data Analytics Manager, Office of the National Coordinator for Health IT
Niall Brennan, Director of the Office of Information Products and Data Analytics, Center for Medicare and Medicaid Services
Miya Cain, Office of the Assistant Secretary, Administration for Children and Families, US Department of Health and Human Services
Edward Salsberg, Director, National Center for Health Workforce Analysis, Health Resources and Services Administration
Robert Post, Environmental Protection Agency (EPA)
Eugene Hayes, the Substance Abuse and Mental Health Services Administration (SAMHSA)
Jim Craver, Center for Disease Control and Prevention (CDC)
David Forrest, Senior Advisor, Health and Human Services Office of the Chief Technology Officer
Tania Allard, Director of Intergovernmental Affairs & Special Projects, New York State Department of Health
Steven Edwards, Environmental Protection Agency
Steve Emrick, National Library of Medicine
Carol A. Gotway Crawford, Director of Behavioral Surveillance, Centers for Disease Control
This perennial favorite breakout session is back! This is the best opportunity to meet some of the federal government data experts who champion action in improving public access to information to catalyze innovation. Come learn how to use assets from the Department of Health & Human Services (HHS), the Department of Agriculture (USDA), the Environmental Protection Agency (EPA) and more. Each agency in the federal government is staffed by experts who are well versed in the information resources available from their division on data.gov (administrative data, survey data, research data, medical/scientific content, etc.) The Datalab will also feature opportunities for one-on-one meet-ups with data experts for “deep dives” into agency’s resources. Participants can join live demonstrations and check out new data resources and tools. The goal of the session is to give innovators and entrepreneurs an overview of new, updated, and emerging datasets that can be used to support new applications and services.
Emerging issues in understanding evidence from complex, public health interventions
Présentation de Ross C. Brwnson au colloque "Recherche interventionnelle contre le cancer : Réunir chercheurs, décideurs et acteurs de terrain » - 17 et 18 novembre 2014, BnF, Paris
This webinar will demonstrate how to use HealthyCity.org to enhance your grant proposals and reports with visually impactful and relevant data and maps. Learn how to access data highlighting the needs and opportunities within your communities and how to make the case that your program will make a difference.
This presentation analyzed data from 200,000 Causes users to understand user types. Quantitative methods like k-means clustering and decision trees identified 6 personas. Surveys and interviews with 1,466 and 65 users provided qualitative data. The results identified six main user types of Causes: Casual Activist, Self-Assured Millennial, Practical Activist, Ambitious Activist, Organized Retiree, and Tenacious Veteran Activist. Future work will engage users through tailored approaches based on their needs and identify ideal user types.
Stakeholders of Organic Products in Mexico and KoreaXanat V. Meza
The document summarizes a study that analyzed stakeholders of organic products in Mexico and South Korea using Twitter data. Key findings include:
1) The Twitter network for organic products was larger in South Korea than Mexico in terms of nodes and edges. Both networks increased in size over time.
2) Influential players in Mexico included media outlets and suppliers, while in South Korea included "others" and suppliers.
3) Stakeholder types like consumers, suppliers, media, and government played a role in diffusing information about organic products on Twitter networks in both countries.
Similar to Social Media for Lifestyle Health: Multimedia (20)
Modeling Human Values with Social MediaYelena Mejova
IC2S2 2019 Tutorial by Kyriaki Kalimeri and Yelena Mejova. Overview of theories on values and examples of studies that track values using social media in domains of politics, religion, and nutritional health.
Information Sources and Needs in the Obesity and DiabetesTwitter DiscourseYelena Mejova
This document analyzes information sources and needs around obesity and diabetes on Twitter. It finds that while mainstream news shares scientific knowledge, especially for diabetes, unverified health sources are more popular on Twitter. Content from verified health domains is less shared than unverified domains, which focus more on dieting and treatments. The analysis also found fat shaming is prevalent in obesity discussions on Twitter. Users' information needs include questions about treating and preventing obesity and diabetes, the psychology behind the conditions, societal acceptance, and government policies.
Social Media and Tech for Health ResearchYelena Mejova
Social Media for Health Research: The Big Picture traces the evolution of social media from 1995 to the present and future, highlighting key developments and implications for health research. It discusses how platforms like newsgroups, websites, and message boards in the 1990s-2000s enabled connections among people with similar health issues. The rise of social media in 2008 fostered wider sharing of health information and data. Mobile devices from 2011 onward made such sharing more ubiquitous. Today's digital ecosystem produces vast amounts of personal health and wellness data through platforms, wearables, and direct-to-consumer services. Both opportunities and challenges exist regarding use of social media for observational health studies, interventions, and personalized care while protecting privacy and combating health mis
Social Medial for Health Research: InterventionsYelena Mejova
A part of the Workshop on Social Media for Health Research, here we look at some of the latest research on the success of using latest tech for health interventions, including social support, wearables, and gamification.
Also check out slide deck on Social Media Research and Practice in the Health Domain at Weill Cornell Medicine Qatar https://www.slideshare.net/IngmarWeber/social-media-research-and-practice-in-the-health-domain-tutorial-part-ii
Language of Politics on Twitter - 03 AnalysisYelena Mejova
This document provides an overview of research on analyzing political language on Twitter. It discusses sampling political tweets, classifying users' political leanings through text analysis, networks, and crowdsourcing, and predicting election outcomes. It also covers analyzing sentiment around debates, distinguishing vocal from silent users, detecting misinformation, and predicting primary results. The document aims to introduce key studies in this area and highlight challenges in using Twitter for political prediction.
Religion on Social Media ICWSM 2015 Workshop IntroductionYelena Mejova
The First Workshop on Religion on Social Media is a one-day ICWSM 2015 workshop. This presentation motivates the workshop and introduces its themes and the guest speakers -- Robin Dunbar, Yasmine Hafiz, and Peter Webster.
Giving is Caring: Understanding Donation Behavior through EmailYelena Mejova
Every day, thousands of people make donations to humanitarian, political, environmental, and other causes, a large amount of which occur on the Internet. In our paper presented at CSCW'14, we describe a comprehensive large-scale data-driven study of donation behavior. We analyze a two-month anonymized email log from several perspectives motivated by past studies on charitable giving: demographics, user interest, external time-related factors and social network influence. We show that email captures the demographic peculiarities of different interest groups, for instance, predicting demographic distributions found in US 2012 Presidential Election exit polls. Furthermore, we find that people respond to major national events, as well as to solicitations with special promotions, and that social connections are the most important factor in predicting donation behavior.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
10. geo-location
• GPS tagging of tweets (<5%)
• User location strings (up to 65%)
• Locations mentioned in tweets (unreliable)
11. population levelaverage caloric value of all foods mentioned in
tweet using exact keyword matching
vs
obesity @ Center of Disease Control .
food keyword frequency vs obesity
r = 0.56***
final model: demographics + detected foods
12. • Education &
Income:
– US Census at ZIP
code level
• Gender:
– genderize.io
37% f / 32% m
– neither:
– 26.7% not human,
excluded
13. crowdsourcing
• Got a complicated tasks computational tools cannot handle
(well enough)?
• Break it into small pieces and have random people on
internet do it
• Cheap per task
• Several labels per task for
quality assessment
• Are people any good at your
task? (upper bound)
16. >
individual level
• WeFollow prominent
user lists, followership
as proxy for interest
top 15 factors by magnitude of coefficient
17. network level
• Friendship & Mention networks
• Social activation: users above 90th percentile in
terms of obesity and/or diabetes score
(personalized using Ridge regression on foods a
user has mentioned)
• Threshold model: success of a social diffusion
process depends on reaching a certain critical
number of adopters
• Activation probability given x of your neighbors
are activated users
18. Friendship Network Mention Network
network level
• Content spread: remove replies & retweets
• Geography: remove links from same state
19. Social Media Food Deserts
Characterizing dietary choices, nutrition, and language in
food deserts via social media
De Choudhury, Sharma, Kiciman @ CSCW'16
20. Food desert:
– Low-income census tracts with a substantial
number or share of residents with low levels of
access to retail outlets selling healthy and
affordable foods
• an estimated 13.5
million people in the
United States have low
access to a
supermarket or large
grocery store, with 82
percent living in urban
areas.
https://www.ers.usda.gov/data-products/food-access-research-atlas/documentation/
24. • need to control for confounding variables
when comparing to a control set
controltreatment
25. Socioeconomic Variables
population
% minority population
#households
#families
% non-Hispanic whites
median house age
median family income
owner occupied housing units
distressed/underserved tract
Similarity via Mahalanobis distance
Selection via k Nearest Neighbors
Discard FD tracts without good enough matches
matching
27. • differences in nutrition
between FD and matched
NFD can be big
• but they also vary across
the region
Statistical significance in nutritional
attributes of FDs and NFDs, with
Bonferroni adjustment of α
28. predicting whether a
tract is a food desert
• S = socioeconomic
• F = food
deprivation
• T = topic
distribution
classification
Instagram topic information helps!
29. error analysis
• social media helps
identify recent
developments like
gentrification
Atlanta, Georgia
Popular topics:
“smoothie”, “organic”,
“farmtotable”, “baking”
30. Social Media Images Disclosure
Is #Saki Delicious?: The Food Perception Gap on Instagram
and Its Relation to Health
Ofli, Aytar, Weber, al Hammouri, Torralba @ WWW'17
31. • most studies take
hashtags as true
description of image
content, but are
they?
32. data collection
• query: #food,
#foodporn,
#foodie,
#breakfast,
#lunch, #dinner
• 72M images, 26M
with location
• 4M assigned to US
county
• 3.7M on hashtags
canonical
food lexicon
food-related posts
geo-localized
food-related posts
10,000 hashtags
as food-related
33. geo-location 3
• Geo-location:
– county shape files from US Census
• https://www.census.gov/programs-
surveys/geography.html
– shapely python library
• https://github.com/Toblerity/Shapely
34. image labeling
• Deep residual network – “can train substantially
deeper models”
– He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep
residual learning for image recognition. In Proceedings
of the IEEE Conference on Computer Vision and
Pattern Recognition (pp. 770-778).
– In the training procedure, the final 1000-way softmax
in the deep residual model is re- placed with a 101-
way softmax, and the model is fine-tuned on the Insta-
101 and Food-101 datasets individually
35. image labeling
L. Bossard, M. Guillaumin, and L. Van Gool. Food-101–mining
discriminative components with random forests. In European
Conference on Computer Vision, 2014.
101 food categories, 101,000 images (750 train / 250 test)
Instagram images matched manually to Food-101 categories
(4000 train / 250 test, no manual cleaning)
36. perception gap
• difference in how a machine
and a human annotate a
given image
• then aggregated for user (no
one user contributes too
much), then per county
37. perception gap
machine is more likely than human to
label post #instabeer in places where
there is higher food insecurity
machine is less likely than human to
label post #sushiroll in places where
there is higher food insecurity
machine is less likely than human to
label post #chicagopizza in places
where there is higher alcohol-related
driving deaths
38. variation in subjective labels
• for a subjective user tag j, and each machine tag i,
compute P(j|i)
• computed first for a user, then aggregated per
county
• focus on #healthy, #delicious, #organic
41. • Social media can be a source of cheap training
data
• Images provide alternative “view” to hashtags
observations
42. Social Media Images Health
Social Media Image Analysis for Public Health
Garimella, Alfayad, Weber@ CHI’16
43. data collection
• geo-located Instagram posts at restaurants
• mapped to US counties using Federal
Communication Commission API
• top 100 counties by image count used
• 2,000 images randomly selected for each
county
44. Imagga
• returns tags with a
confidence score (use
tags with at least 20%
confidence)
• tags appearing in less
than 10 counties are
ignored
• free account limit 1
image per second
try it! https://imagga.com/auto-tagging-demo
45. predicting public health
Ridge regression with smoothing parameter α=0.1
U = user-provided tags, I = machine-generated tags, D = demographics
Showing correlation between predicted health statistic and known
Statistical significance of demography-only baseline
48. Social Media Images Mental Health
What Twitter Profile and Posted Images Reveal
about Depression and Anxiety
Guntuku, Preotiuc-Pietro, Eichstaedt, Ungar @ ICWSM’19
49. data collection
• surveys administered on platform
Qualtrics
• demography, Beck’s Depression
Inventory
• informed consent
• Twitter handles
• 560 people with 20 or more images
posted
• + Facebook dataset, text only
highdepressionlowdepression
50. image description
• Hue – Saturation – Value (HSV)
• Hue count (professional photos have fewer hues)
• 6-bin and 12-bin color histograms
• Warm & cold colors
• Aesthetics: deep CNN that produces labels such as object emphasis,
rule of thirds, symmetry, motion blur, vivid color…
• Content: Imagga tags (top 10) clustered via Normalized Pointwise
Mutual Information (NPMI)
• Content: VGG-Net image classifier for 1,000 objects
• Face features via Face++ and EmoVu APIs (emotions, smiling)
51. Pearson correlations
between color and
aesthetic features extracted
from images and mental
health conditions, and with
age and gender (coded as 1
for female, 0 for male)
separately. Correlations for
mental health conditions
are controlled for age,
gender and other mental
health condition. Only
significant correlations (p <
.01, two-tailed t-test) are
presented.
52. “while depressed users
preferred images which are
not sharp and which do not
show face, anxious users
usually chose sharper
images with multiple
people in them”
54. mental health classification
Using visual features of posted images in predicting mental health conditions. Using
linear regression with ElasticNet regularization. Performance measured via Pearson
correlation (MSE in brackets). ST – single task, MT – multi-task
Additional training on text-based features (users labeled for age and gender) boosts
performance up to r = .167 for depression and r = .223 for anxiety.
55. observations
• easier to model populations than individuals
• easier to model behaviors than mental states
• text often describes the images (hashtags)
• getting ground truth can be difficult
56. Could you do it better?
Can people guess a person’s BMI better from a picture than a machine?
Face-to-BMI: Using Computer Vision to Infer Body Mass Index on Social Media Garimella,
Kocabey, Camurcu, Ofli, Aytar, Marin, Torralba, Weber @ ICWSM’17
57. Male
30 years old
6 feet 5 inches tall
Starting weight 385lb
Current weight 310lb
Goal weight: 225lb
58. data collection
• Cropping images with known gender, height, and weight from Reddit
• 4206 faces with BMI = body mass in kg / (body height in m)2
59. face-to-BMI modeling
• Use pre-trained deep learning features:
– general object classification (VGG-Net)
– face recognition task (VGG-Face)
• Then train on the BMI dataset, test on held-out set
60. crowdsourcing BMI
• simpler than guessing
BMI: comparing BMI of
two pictures (M-M, F-F,
M-F)
• use Amazon
Mechanical Turk, 3
labels per task
• compared to machine,
human performance
differs by 2%
61. bias?
• Algorithm could be learning existing stereotypes (ex:
African Americans tend to have higher obesity rates in US)
• Try balanced set of 2000 male-female pairs, 1037 chosen
higher BMI for females (p = 0.05)
• Try balanced set of 2000 White-African American pairs,
1085 chosen higher BMI for White (p = 0.05)
63. • combining recipe information (ingredients + instructions) with images
• crawled recipe websites, standardized the information: Recipe1M dataset
64. • use LSTMs for modeling ingredients and two-stage LSTMs for modeling cooking
instructions, deep convolutional networks for image representation
• Semantic Regularization learns mapping between images and food categories
71. • Getting social media data
– Ex: Twitter stream listener
• Geolocating users
– Ex: CLAVIN
• Linking to health statistics & census
– Ex: County Health Rankings & IStat
• Labeling images
– Ex: Imagga
• Linking to nutritional information
– Ex: Nutrition lexicon from recipe
• Relating diet to obesity
73. Twitter Streaming Pipeline
• Need to think of keywords at beginning of project
• But get a lot of data over time
• Make sure job is always running, restart if necessary, store data in small chunks
Twitter
Streaming
API client
Twitter Job Watcher crontab
Daily
dumps
Log
75. is the job still running?
JobWatcher.pl
what is the last file saved?
get today’s formatted date
output status
if it’s not running, start the job
if it’s running but it’s next day, restart it
process yesterday’s data
80. using non-streaming APIs
• careful with rate limits
• restrictions on amount of data (back in time,
number of items)
• deleted content or accounts are not there
• get historical interactions with content (likes)
83. The Somali Ministry of Information, Posts and Telecommunications started the process of distributing
6,000 hand-held radios to Internally Displaced Persons (IDPs) in Mogadishu. In the first batch, the
Ministry handed out 1,000 radios at Badbado camp, Somalia's largest IDP camp. The radios were
received by to those most in need: namely, female-headed households, elderly and youth groups.
The beneficiaries will receive news and important information concerning relief efforts and public
safety messages daily. The small emergency radios are both solar-powered and hand-cranked and can
also operate with batteries. The radios can be tuned in to multiple frequencies.
The Deputy Minister of Information, Posts and Telecommunications, H.E. Abdullahi Bile Nur, who
witnessed the distribution process at Badbado camp, said: "In any emergency, the first priority is the
delivery of critical aid, but communities need more than that. They also need information. It is
important for them to know where they can get water, where they can get certain facilities, how to
access those facilities."
"We believe the radios will make a difference in terms of morale and education;" he added. Radio
Mogadishu broadcasts a daily show named 'Recovery' (previously 'Help') that is packaged along with
the latest announcements and information from humanitarian agencies. The program offers guidance
on hygiene and sanitation, nutrition, child education, good neighborhood, becoming productive
members of the society, among other key topics.
Somalia’s Prime Minister Dr. Abdiweli Mohamed Ali received the European Union envoy to Somalia
Alexander Rondes in his office in Mogadishu today. The envoy was accompanied by EU officials and
others while Somali minister for defence Hussein Arab Isse and minister for foreign affairs were also
present in the meeting.
The premier warmly received the EU envoy and thanked him for visiting Mogadishu. He requested the
EU to double its efforts in the restoration of peace and stability in Somalia.
The two leaders discussed the strategic plans of setting up control and authority in the areas reclaimed
from Al-shabab in order to deliver the much needed public services and humanitarian aid to the
people.
The meeting by the premier and the EU envoy also highlighted the upcoming London meeting which
aims at delivering a new international approach to Somalia. The premier stated that the prevalent
security made by the government provides an opportune time for consolidation of such gains.
The special envoy, who was appointed by EU to represent the horn of Africa region, stated that he will
give priority to Somalia. It seems the international community is making concerted effort to the
Somalia issue after the government made important strides in security, the new constitution,
restructuring the parliament, local administrations’ cooperation and good governance.
Resolved "European Union" as: "European Union" {European Union (No Man's Land, )
[pop: 0] <6695072>}, position: 1590, confidence: 1.000000, fuzzy: false
[50.83857,4.37599]
Resolved "London" as: "london"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@285818}, position: 2306,
confidence: 1.000000, fuzzy: false
[51.50853,-0.12574]
Resolved "Africa" as: "Africa" {Africa (No Man's Land, ) [pop: 1031833000]
<6255146>}, position: 2586, confidence: 1.000000, fuzzy: false
[7.1881,21.09375]
Resolved "Muna" as: "Muna"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@35c23f}, position: 2957,
confidence: 1.000000, fuzzy: false
[20.48794,-89.71387]
Resolved "Mataban" as: "Mataban"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@d460}, position: 4161,
confidence: 1.000000, fuzzy: false
[5.20401,45.53353]
Resolved "Birmingham" as: "Birmingham"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@28866c}, position: 4841,
confidence: 1.000000, fuzzy: false
[52.48142,-1.89983]
Resolved "Hiran" as: "Hiran"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@7b1830}, position: 5208,
confidence: 1.000000, fuzzy: false
[14.44998,45.57068]
Resolved "London" as: "london"
{com.bericotech.clavin.gazetteer.LazyAncestryGeoName@285818}, position: 6129,
confidence: 1.000000, fuzzy: false
[51.50853,-0.12574]
Resolved "East Africa" as: "Portuguese East Africa" {Mozambique (Mozambique, 00)
[pop: 22061451] <1036973>}, position: 7589, confidence: 1.000000, fuzzy: false
[-18.25,35.0]
In: Out:
Check Quality of Matching for Most Frequent Terms!
84. GPS to FIPS (USA)
often need smaller
administrative unit ID
We calculate the log likelihood ratios (LLR) of each of the canonical names. It is given as the natural logarithm of the ratio between their normalized frequency of occurrence in each food desert of a region, and that in the matching non-food deserts corresponding to each food desert.
Evaluate on ImageNet, getting 3.5% error rate. This result won the 1st place on the ILSVRC 2015 classification task
Fig: Mean cross-dataset performance of Insta-101 model trained with increasing number of training samples per category. Note that around 2500 samples per category Insta-101 model reaches the performance of Food-101 model.
In the training procedure, the final 1000-way softmax in the deep residual model is replaced with a 101-way softmax, and the model is fine-tuned on the Insta-101 and Food-101 datasets individually
“D: AA,H” indicates demographic features pertaining to African Americans and Hispanics
Use averaged vectors of recipes, and perform “geometric transformations in the learned space”