Successfully reported this slideshow.

Food and Culture



Loading in …3
1 of 35
1 of 35

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Food and Culture

  1. 1. Food and Culture CSS @GESIS Claudia Wagner GESIS & University of Koblenz 6nd Nov 2014, Yahoo Labs, Spain
  2. 2. 18.11.2014 Claudia Wagner 2
  3. 3. Research and Services at GESIS Survey Design and Methodology Computer Science and Information Science Raise the standards of surveys at all phases of the survey life cycle Gender studies, Political science (e.g., GLES), Values and Attitudes research (e.g. ALLBUS), ... Computational Social Science Knowledge Discovery, Information Retrieval, Information Extraction, … Social Science Research 18.11.2014 Claudia Wagner 3
  4. 4. CSS Agenda @GESIS Support traditional Social Science research with computational methods and tools Develop new instruments to tap into the potential of found data and crowds  building a telescope for the Social Sciences Online impacts offline! Build new algorithms and tools to shift the current configurations of societies towards better futures. PAST PRESENT FUTURE 18.11.2014 Claudia Wagner 4
  5. 5. Food 18.11.2014 Claudia Wagner 5
  6. 6. Data • ~ 470k Unique Users ~1 Mil. Page Impressions per week • – 2,27 Mil. Unique User in July 2014 – 1.29 million Visits (12.1 Mio. PI) in December 2008 • – 11,05 Mil. Unique User in July 2014 – 28 Mio. Visits and 242 Mio. PI in December 2010 6 Sources:
  7. 7. Recipe Popularities 50000 40000 30000 20000 10000 0 18.11.2014 Claudia Wagner 7
  8. 8. Ingredient Popularities 120000 100000 80000 60000 40000 20000 0 18.11.2014 Claudia Wagner 8
  9. 9. Temporal Stability 18.11.2014 Claudia Wagner 9
  10. 10. (  )  t 10 Meat Carbohydrates Fish Vegetable Alcohol Normalized Access Volume per Weekday  t X Z
  11. 11. 18.11.2014 Claudia Wagner 11
  12. 12. 18.11.2014 Claudia Wagner 12
  13. 13. 18.11.2014 Claudia Wagner 13
  14. 14. 18.11.2014 Claudia Wagner 14
  15. 15. ( ( ) ( )) 1 ( ) ( ) 15 Meat Carbohydrates Fish Vegetable Alcohol Change Rate per Weekday  F t F t i i       N j j j t F t F t R 1 1
  16. 16. Most Popular Recipes • Berlin: • Frankfurt: •• VKiieenl:na: 18.11.2014 Claudia Wagner 16
  17. 17. City Similarities 18.11.2014 Claudia Wagner 17
  18. 18. Bundesarchiv Bild 173-1282, Berlin, Brandenburger Tor, Wasserwerfer 18
  19. 19. Regional Similarities 18.11.2014 Claudia Wagner 19
  20. 20. Regional Similarities Berlin East West 18.11.2014 20 West East
  21. 21. Culture 18.11.2014 Claudia Wagner 21
  22. 22. Wikipedia 27 language communities 31 cuisines 22
  23. 23. Cultural Relations Similarity Understanding Affinity 18.11.2014 Claudia Wagner 23
  24. 24. Cultural Similarity sim(퐴, 퐵) = |퐴 ∩ 퐵| |퐴 ∪ 퐵| Jaccard similarity German cuisine Italian cuisine Wheat Beer Sauerkraut Riesling Pasta Sousage Tortano Pizza Parmigiano sim( , ) = 1 8 18.11.2014 Claudia Wagner 24
  25. 25. Cultural Similarity between Neighbors 18.11.2014 Claudia Wagner 25
  26. 26. Cultural Understanding Understanding 2 / 5 0 / 6 Understanding the Italian food culture Wikipedia edition Used concepts “Native” definition 18.11.2014 26
  27. 27. Cultural Understanding 18.11.2014 27
  28. 28. What may explain Cultural Understanding? • Create for each country a list of countries ranked by where most of its immigrants come from • Create for each country a list of countries ranked by how similar their values and beliefs are according to ESS Pair ρ (p-value) wiki – ess 0.18 (0.00019) wiki – migration 0.36 (1.74e-22) 28 Germany 18.11.2014 Claudia Wagner
  29. 29. Cultural Affinity • View statistics of cuisine pages in different language editions • How much more attention than we would expect does language community A pay to the culture of community B? 18.11.2014 29
  30. 30. Cross-cultural affinities But what explains them? GERMANY TURKEY de/tr/German Croatian (+(+0.1464) 0.0173) de/tr/French Serbian (+(+0.0850) 0.0114) de/tr/Italian Polish (+0.0051) 0.0114) de/Dutch (+0.0037) ρ=0.25 18.11.2014 30
  31. 31. What drives cross-cultural attention? Popularity Model Popularity-Affinity Model es it de es it de 18.11.2014 Claudia Wagner 31
  32. 32. What drives cross-cultural attention? Popularity Model Popularity-Affinity Model 18.11.2014 32
  33. 33. Self-Focus & Regional Bias 18.11.2014 Claudia Wagner 33
  34. 34. Summary • Affinities between language communities are present in Wikipedia and drive the attention process • Cultural understanding can to some extent be explained by migration • Cultural similarities inferred from Wikipedia are pretty plausible  crowdflower • Relation between similarity, understanding and affinities? – Understanding and affinity: -0.35 – Similarity and affinity: 0.27 – Similarity and understanding: 0.19 18.11.2014 Claudia Wagner 34
  35. 35. Thank you! Questions? Comments? Lunch? @clauwa

Editor's Notes

  • Stuff I am presenting is very much ongoing work. The goal is to give you an impression about the type of CSS we do at GESIS and to through some ideas/thoughts out about the potential usefullness and limitations of using observational data for tackeling social science research questions.
  • During my PhD I was interning at the OU, Parc and hp. My mixture of interests which range from pure CS methods towards behavioral analysis of users or user-groups, brought me to GESIS where in 2013 the first CSS in Europa was founded.
  • Established in 1986 – its publicly prefunded. Huge potential of found data for SS  found data are data which are not generated for a specific purpose (e.g., server logs). Found data are nothing new for the quant. social sciences, but social scienists have a pretty good understanding WHEN to use them and WHEN not. For example when studying alkohol consumption with surveys they will miss teenager drinking. Studying trash boxes near schools to correct the survey results.
  • Study online traces to learn sth about offline world is only part of the CSS story and it’s the part where you have to argue a lot with SS. The other part of the story: How does the availability of these data shape our behavior? What are the societal implications of these data and the algorithms that decide which data is accessible to whom when? Understanding these questions is important for 2 reasons: 1) if algorithms e.g. reinforce gender-biases online then you want to know that and maybe new dimensions for evaluating algorithms are needed. 2) We need to understand the bias that algorithms introduce in the data we observe!
    Project or a Discussion: Gender Biases in Wikipedia. Wikipedia reflects an unbalanced world, so males have higher indegree, get higher PR and so on. Most algorithms will reinforce the bias by making this sites more visible.

    Example BM paper on rank algs for articles about notable people in wikipedia. David Lazers group: Personalization on google searchers and price comparison sites.
  • The 2 research projects I will present fall both in the second category. Use online data to unsertand offline.

    Food is very central to all human beeings and it is effected by many factors: e.g. economics, social factors, cultural factors, biology… For social scientists is mainly interested because it effects the health status of the society and it helps us to learn about social groups and cultural differences. Most of our food related preferences are learned  through experience.
    Nowadays people interact with food online a lot. One of the first thing…

    Kochbar is 4 times larger than ichkoche. Chefkoch is 5 times larger than ichkoche.
  • server log data from the three largest recipe platforms in the german speaking area.
    recipe popularity distribution -> compare them across cities, or over time, look at the relation between the weather and what people eat, relation between city (or neighbourhood) characteristics and what people eat.
  • Then the inferred popularity of tomatoes would be 60k. So that’s how we generate a ingredient popularity distribution. So the question now is if these ingredient popularity distribution tells us sth about the taste of people who generated it or if it is just a side product of the the ingredient universality distribution  which tells us in how many recipe an ingredient is used.
    Idea is similar to: if ingredients used in recipes would be randomly selected what number of shared flavour compuunds would we observe and what do we observe empirically in different cuisine.
  • So one of the first things which we did beside looking at the shape of the distributions was to analyze how stable the preferences are over time. So how much does the popularity ranking changes during the course of a week. We use a top-weighted overlap measure and compare to rank lists of items!
    Ingredients tend to be more stable since the head of the distribution does not change. Salt sugar and oil are always the most popular ingredients. However the recipe popularity changes at the weekend and some key ingredients change as well.
  • Since we know from offline diestary studies that certain types of ingredients are especially popular during the weekend (e.g. meat), we looked at the popularity of different types of ingredients during the course of a week.
    And we indeed see that People eat more meat on weekends (turkey and chicken are not seen as meat). Trend for carbohydrates are workdays. Trend for fish is Friday. Trend for vegetables is the begin of the week. Trend for alcohol is the weekend.
    Is this pattern universal?
  • kochbar
  • So we observe a slight shift from weekday towards weekend preferences and a pretty clear cut from Sunday to Monday. Is this pattern universal?

    So we picked some ingredients but what happens if we look at all ingredients
  • What the hell are these people in frankfurt checking out?
  • Most cities are extremely similar in their recipe preferences!
  • Of course Germany is special in the sense that it was divided for more than 40 years (1949 until 1989).
    So if the platform is introducing a very strong bias we might even not be able to see difference between east and west right?

    Still there are striking differences between east and west and at GESIS we actually have the data to compare how attitudes, beliefs and values of Germans in East and West Germany changed over the last 10 years. Surveys started in 1990 and go on until today! So now we can wonder if also the dietary preferences are distinct.
  • Federal States of West are more similar than within East.
    Cities across East and West germany are less similar than just cities within either East or West.
  • Regions in the west are more similar to other west regions than regions in the east are to other east regions. But in general ingroup sim is on avergae higher than across group sim.
  • Food is one dimension of culture since social groups often differentiate by what they eat or dont eat, when they eat or dont eat and how they prepare their food. Marco Calvo __> situations of migration
    Traditional: culture as shared meanings and believes. survey. BUT The researcher himself has a cultural background. Hofstede or Alavi in 2007 Collective Orientation Scale.
    Pierre Bourdieu la distinction  he argues that taste and related practices are used to differentiate from others.

    Alternative perspective on culture was introduced by West and Graham in 2004: they use the origin of language to define cultural distance and found that 40% of the shared values can be explained by language.
    Another example is the science article from Michel in 2001 who uses digitalized books to quantify culture.
  • server log data are only for DE, AT und CH.
    Wikipedia allows to observe how language communities preserve their culture online by describing it and how it is consumed. Every person perceives its own culture and other cultures through their own ethnic and cultural lenses. Therefore biases and missunderstandings may be present on Wikipedia. We hypothesize that those biases and missunderstanding reflect the real world, but also that Wikipedia impacts the real world and has the power to hinder or foster corss-cultural missunderstandings.

    We associate cuisines with language communities  e.g. spanish speaking community with the spanish cuisine (though the southamerican cuisine), German speaking community with the German, Austrian and Swiss cuisine.
    Article „Spanische Küche“ represents the view of the Spanish cuisine as seen by the German speaking language group

    3 datasets: view statistics, word count and outlink count (first and second hop) (explain outlinks!) View counts correlate very strongly (above 0.9) with outlink count
    Not every cuisine is described in each Wikipedia
  • So we wanted to develop an automated method to quantify the relation between differenet lang. communities.
    How similar are their cultures?
    How well do they understand the culture of other groups?
    How much interest do they have in the culture of others? Affinity biases?

    What’s the cultural similarity between germany and italy, the cultural understanding and cultural affinity from the perspective of the German speaking population and from the perspective of the italian speaking population?
  • Using outlinks and their overlap
    Local and global perspective
  • On average each country is 1,5 time more similar to its neighbours than to distant ones. We also setup a crowdflour task and asked crowdworkers to assess which pair of cuisines is more similar. We only compared top rank versus low rank and got more than 99% correct. We had 10 judgements per pair and 225 pairs.
  • Italian, Swedish, Germans. Sweden would understand italian better than germans do.
  • Small overlaps - most food cultures are badly understood. Some food cultures like the one of the France , Italy and Turkey is however better understood by most language communities. Italian and french cuisine are famouse. But what about Turkey???
    The good understanding of the turkish cuisine by different lang. Communities can be explained by the fact that many immigrants from Turkey moved to other EU countries. E.g. in germany where the largest number of non-nationals live 22% of them are Turkish residence. Also in the netherlands and Denmark most immigrants are turksih. Beside Turkey also Romania, Poland, Estonia abd Russia have many residence living in other European countries.

    Germany largest numbers of non-nationals living in the EU on 1 January 2013 were found in Germany (7.7 million persons), Spain (5.1 million), the United Kingdom (4.9 million), Italy (4.4 million) and France (4.1 million).
  • External Sources: no proper ground truth, we assumed that cultural understanding can have two explanations
    Cultural understanding due to cultural similarity
    Cultural understanding due to exchange (migration)
    Migration data from the Global Bilateral Migration Database
    Migration seems to explain cultural understanding better than cultural similarity
    External sources do not correlate well with each other – not valid?
  • F(l,o)  how often does a language community l view an object o, compared to how much they view all other objects. Normalized by the popularity of the objects.
  • We compared the for the list of country pairs ranked by their wikipedoia affinity values and their Eurovision affinity values. We found a cor of 0.25. One can see that also the distr. Of affinity values is differnet. Most affinities which we infer from wikipedia are around zero (nos special affinities) with some strong positive expceptions.
  • So what role does affinity play on Wikipedia? How can we reproduce the cross-cultural affinity score distr which we inferred from Wikipedia? IDEA: simulate the cross-cultural attention process (view statistics) and infer affinity values from these synthetic view data. 2 Simulation models which make differnet assumptions about what drives the cross-cultural attention process.
    Popularity model assumes that the global popularity of countries/cuisines is the only factor which impact how each agent decides how to distribute its attention. Popularity-affinity model includes directed edge specific weights.
    We use both models to generate synthetic cross-country click distributions and compute affinity values for the synthetic click distributions. we compare the distributions of affinities obtained from the empirical data with the synthetic ones.
  • Models  generate synthetic cross-cultural view data and infer synthetic affinity scores. Compare distr of synthetic score with empirical score  using divergence. If the divergence (Y-axis) is ZERO we have a perfect fit the 2 distr are the same.
    Popularity of countries/cuisines is a exponential distr. Lambda describe skewness of distribution. If lambda is too big (i.e. distr is too skewed) the fit is bad. That means an exponential with y <= 1 is best. Some cuisines are more popular but not much more. Popularity and affinity Model: in addition to the global popularity we introduce affinities between countries which follow an a normal distribution. Most countries have no special affinities (i.e. are around zero) with some outliers.
  • Most countries have a strong self-focus bias. Especially when you look at views.
    Around half of our lang. communities show a slight positive regional bias and half do show a slight negative one. But the affinity values are rather low and closed to zero.
  • Maybe we need to accept the fact that we have to come up with our own scales which describe what me can measure online a platform-independent way. And maybe we need also to acknowledge the fact that each platform introduces a bias which we need to understand or we need at least several platforms to make sure that we are fine.
  • ×