Learning the lingo


Published on

From my CSCW 2012 talk about language, gender, and utility on IMDb. Slides and notes available as PDF:
More info about the project available: http://www.casmlab.org/projects/informationbias/

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • \n
  • Between studies like those mentioned during yesterday’s panel on gender and the popular press, the question of how women’s online participation differs from men is a nagging one. As I mentioned during the Q&A part of the panel, I’m troubled by the conflation of genders and sexes and genders and behaviors, but, we’ve used those broad categories in this research as well. I’ll happily talk about the differences between sex and gender during the Q&A. Issues of sex and gender are messy, especially online, and our project is an attempt to use the signals available to understand a small part of what might be driving differences we see between the scale of men’s participation and the scale of women’s.\n
  • Well, our results suggest that women do contribute, some even profusely, but that their contributions are buried. Using data from the IMDb review site, I’ll show you that women on IMDb are adopting the majority voice in their language use, without interacting directly with other reviewers, and that their contributions, even when indistinguishable from men’s, are shoved out of view because they lose the zero-sum game of “making the first page of results”.\n
  • And all it takes is a few answers to this question: Was the above review useful to you?\n
  • This kind of question, this form of social voting, is increasingly popular. We see it on Facebook, Amazon, The Hairpin, Guitar Center, even Buzzy. Designers of many open contribution systems ask users to provide feedback about the contributions and then use that feedback to sort the long lists of contributions. We wondered about the relationship between the votes people give and some features of the contributions.\n
  • specifically gender and language.\n
  • so we set off to IMDb to find out. IMDb is a huge resource about a variety of visual media, and we focused on film. IMDb’s content comes from a variety of sources, and our study focuses on the reviews provided by registered IMDb users.\n
  • IMDb user reviews look like so. On IMDb, “utility” means the proportion of people who found the review useful out of all of those who voted. So here, 3 out 3 voters found this particular review of Elvis, the made-for-TV movie, useful.\n
  • The data we used for the study come from 200 prolific reviewers. First, we found the 250 top-rated movies, according to IMDb users. That yielded 21,012 reviews. From those reviews, we identified all the authors and selected the 100 most prolific men and 100 most prolific women. Then, we gathered all the reviews by those reviewers, and those 199,166 reviews comprise our dataset.\n
  • In those nearly 200,000 reviews, men dominated all measures of activity - reviews written, review length, and the difference in the number of reviews written by the most prolific of each group is four-fold - 8167 to 2061. We focused on the language used in those reviews, and focused on six linguistic features and one measure of utility.\n
  • We used regression analysis to determine the impact of gender and time on review’s utility and various language features. We chose these features of language because previous research has suggested or found significant differences between men and women on each measure. Utility we include because we were curious whether there were differences between men and women.\nHedges: “more of less,” “rather”, number of hedges, normalized by review length (words)\nPronoun rate (#PNs / words) \nProportion of PNs that are first person\nVocabulary richness: diversity of words used. Number of unique words / total words\nWord complexity: Character-to-word ratio\nSentence complexity: Word-to-sentence ratio\n\n
  • Our first hypothesis, based on a bunch of literature you can read about in our paper, was that over time, women would write more like men. We expected to see women adapt their writing to the majority voice.\n
  • Our first measure about language use was “hedging”. Hedges qualify the writer’s commitment to their statement. Some say they are subtle means to avoid responsibility or to obscure the facts. Some say they show politeness or lack of confidence. Either way, existing research on gender and language suggests that women use hedging words much more often than men do. Here, and in other slides where I show users’ content, I’ve left the submissions unedited (except for trimming content before and after). This line from a review of Pulp Fiction is a good example of a hedge-infested review. “Perhaps ‘Pulp Fiction’ may remain tarantino’s opus, perhaps not.” Without hedges, the line is “Pulp Fiction will remain Tarantino’s opus.”\n
  • Surprisingly, we found that both males and females use about the same (see, I hedged) number of hedges when they first start writing, but then, the women decrease their hedge use and men increase theirs. We definitely did not expect to see men increase their hedge use over time. In this and all the other graphs I’ll show, time is on the x-axis, and the language feature or utility is on the y-axis. Shorter red lines represent women, and longer blue lines represent men.\n\n
  • Something else we didn’t expect was an increase in pronouns. When looking at all pronouns - he, she, our, their - we see the gap between women’s use and men’s use that we expect here at the beginning, but then they both increase their pronoun use.\n
  • Not all pronouns, though. First person pronouns, as a ratio of first person to all pronouns, show marked decreases for women and slight increases for men.\n
  • Women show a similar drop in their vocabulary richness. Men’s vocabulary richness also decreased, and the two ended up about the same by the time authors wrote many reviews. Keep in mind that “many” here is a couple thousand for women and nearly 9K for men.\n
  • Sentence complexity showed similar convergence. Here I’ll illustrate sentence complexity with a couple of extremes, first a review with complex sentences and then one that’s less complex.\n
  • Read - see only three sentences here, but it takes up my whole slide.\n
  • The less complex review - READ - has 7 sentences in nearly the same amount of space.\n
  • Word complexity actually decreased in both groups of reviews. Remember word complexity is a measure of the character-to-word ratio, so longer words are more complex.\n
  • In summary, we saw females decrease their hedges, increase their pronouns, decrease first person pronouns, decrease vocabulary richness, increase sentence complexity, and decrease word complexity. Nearly all of these changes were expected since we thought they would adopt “more male” language use patterns. What we didn’t expect were the changes in language use we observed among males. An increase in hedges, especially, was surprising.\n
  • Over all, H1 was supported. Women did write more like men over time. \nconvergence except for hedging\nsomething interesting is happening in pronouns\nhedging surprising increase from men\n
  • Our second question was about whether those changes, that adaptation to the dominant voice, would be accompanied by a rise in utility awarded by readers. We expected to see women’s utility scores rise over time as they adopted the majority voice.\n
  • Cleary, they did. Again, women are red and men blue in this graph. What’s troubling, though, is that even though women showed marked increases in utility, they never catch up. I’ll get to why that matters, but first, a quick stats discussion.\n
  • Now you’ve seen all my graphs, so i can summarize. When we regress these measures, we see statistically significant main effects for all measures. Women use less rich vocabulary, less complex wording, less complex sentences, and receive lower utility scores for their trouble. in all cases except hedges and vocabulary richness, those differences also show meaningful effect sizes. Our N of nearly 200K is large enough that we were likely to see statistically significant differences, so we ran effect size calculations on each model to assess the meaning of those differences. In all cases where I report effect size, it’s a small one.\n
  • By now, you may be wondering about the role of time, and I can touch on that briefly.\nWe did include the number of reviews written in the regression model, and again, we saw significance for all measures. We use “number of reviews written” as a proxy for time. So, rather than measuring time in minutes or weeks, we measure it in increments of review. So, 1 review written, 2 reviews written, and so on until over 8000. \n\nHowever, only 3 measures showed measurable effective sizes: hedges, vocabulary richness, and word complexity. Hedges increased, likely because men went hedge-crazy, and both word complexity and vocabulary richness decreased. \n\nI like to think of it this way: as they write more reviews, both genders get less stuffy in their writing. It’s like being in intro to film where the first day of class everyone feels immense pressure to say something profound, but by the end of the semester, they become comfortable just speaking up.\n
  • So, H2 is supported, but...Women do increase their utility over time, and at a faster rate than men, but they don’t catch up. You remember I mentioned effect size results when I summarized the regressions, and we didn’t see a meaningful effect for utility for either predictor: gender or number of reviews. Normally we’d say then, “oh, well, then the statistical significance doesn’t really matter in the world.” But, yes, there’s the real but... The difference does matter in the world. Because, on IMDb and other sites that use utility to sort their information, the relative utility is all that matters, not the size of the difference. \n
  • When users arrive at a reviews page on IMDb, they see a screen like this one. Notice the Filter there. It says “best”. IMDb has 10 reviews per page, so that means that the first 10 reviews, the 10 “best” reviews, are most likely written by men. We already know that very few people click to the second page of any result set, and just like our tendency to stick with the first page of results buries the John Smiths who aren’t great at SEO, it buries women’s contributions to IMDb.\n
  • When I started this talk, I showed you a Room for Debate about women’s contributions to Wikipedia. Surveys suggested very few Wikipedians are women, and the Room for Debate was trying to make sense of those findings. Joseph Reagle argued, among other things, that we may be rationalizing women’s absence as a lack of interest on their part. Anna North wonders if solitarily editing a contribution is antisocial, and that’s why women avoid it. They, and the other debaters, including everyone who participated in yesterday’s panel, may be on to something. \n\nBut what our study shows is that even when women muster enough interest and brave enough antisocial solo editing to contribute to IMDb, the community doesn’t value their contributions, at least not as much as it value’s men’s. The lower utility scores their reviews receive push their contributions further and further from the top, further and further from eyes that might read them. Notice the passive voice here though. Women’s contributions are buried. Something must be doing the burying for that to be true.\n
  • IMDb does offer alternative methods for sorting reviews. I find it interesting that they use the label “Filter” for their drop down. Really, they’re asking you what criteria to use to determine the order of results, but it’s as if the label knows that by sorting, the page is effectively filtering what you’ll see. We can’t see it all; we must necessarily filter. \n\nIMDb changes the options in this drop down often. This screen shot is from last Wednesday, and the options may actually be different already. So what happens when you choose the Male/Female filter?\n
  • Some sort of crazy coloring happens. The background on the DIVs that hold review content get these faint pastel colors, most of which are laudably gender-neutral, to indicate the gender of their author. It makes for colorful reading, but users have to do some extra work to get here.\n
  • and we already know that users don’t often do that extra work. So, the design of the system - it’s sorting mechanism, it’s default display - effectively bury women’s contributions. We don’t need a systematic rejection of women’s content or overtly sexist moderation for women’s contributions to go unnoticed. A simple “Was the above review useful to you?” will do.\n
  • so what do we know now that this phase of the study is complete? We know that language convergence can happen even without direct interaction. We know that women receive lower utility scores, even when they write like men. And, we know that lower utility scores effectively bury women’s contributions. These results matter because they may help explain that small changes in the design of a system could produce large effects on the information accessed. And this burying effect likely plagues lots of kinds of minority voices.\n
  • As so often happens in research, our results imply more questions than they answer. Some of the questions we’re interested in answering are about the people involved in the system - who does the reading, writing, and voting? Why do they do it? What other kinds of information bias do we produce when using collaborative filtering and social voting mechanisms? And of course, what else is driving these results. As the paper points out, the total variance we’re able to explain is small, so there’s clearly more to the story. We’re especially curious about the effects of the objects being reviewed. So, we have questions about the people, the technology, and the objects.\n
  • And I’m interested to hear your questions as well.\n
  • \n
  • Controlling for movie is a good idea, one we’ve thought about and that our reviewers mentioned as well. We didn’t include it here because Jahna’s earlier work showed gender was a more significant correlate for utility. And, just to doublecheck, we looked at the reviews from some “chick flicks” last night and saw that guys wrote the highly-ranked reviews there as well.\n
  • \n
  • \n
  • \n
  • \n
  • Male reviewer\nReport style\n10 pronouns\n2 first person pronouns\n
  • Learning the lingo

    1. 1. learning the lingo gender, prestige and linguisticadaptation in review communities Libby Hemphill & Jahna Otterbacher
    2. 2. motivating the study
    3. 3. women’s contributions are buried
    4. 4. social voting
    5. 5. what are the relationships among gender, language, and utility?
    6. 6. study site
    7. 7. review utility
    8. 8. data• 250 top-rated movies• 21,012 unique reviewers• 100 most prolific men, 100 most prolific women from that group• 199,166 reviews written by those 200 people
    9. 9. descriptives M Freviews written 1,187 183.5 (median) review length 249 223 (median) most reviews 8,167 2,061
    10. 10. measures- hedges- pronouns- first person pronouns- vocabulary richness- word complexity- sentence complexity- utility
    11. 11. H1 : Over time, women will write more like men.
    12. 12. Perhaps Pulp Fictionmay remain tarantinosopus, perhaps not.
    13. 13. hedges
    14. 14. pronouns
    15. 15. first person pronouns
    16. 16. vocabulary richness
    17. 17. sentence complexity
    18. 18. The Descendants takes a dramatic look atthe structure of family and the intrinsicbonds that holds its members together. Thisdark and wonderful drama painfully revealsthe fact that many families are broken, oftenugly things, yet still mysteriously holdtogether. It provides a solemn example ofhow every single piece of a family can befragmented yet recreated throughcommunal obstacles.
    19. 19. Best picture nominee? Really? It wasquite boring for me... I was waitingthe end of it almost from thebeginning... If there wasnt Clooney, Idont know the reason I would evenconsider to see it. But he is reallyamazing actor! His play wasawesome!
    20. 20. word complexity
    21. 21. H1: Supported
    22. 22. H2: Over time, women’s reviews will receive higher utility scores.
    23. 23. utility
    24. 24. genderstatistical significance: small effect size:hedges hedgespronouns pronounsfirst person pronouns first person pronounsvocabulary richness vocabulary richnesssentence complexity sentence complexityword complexity word complexityutility utility
    25. 25. timestatistical significance: small effect size:hedges hedgespronouns pronounsfirst person pronouns first person pronounsvocabulary richness vocabulary richnesssentence complexity sentence complexityword complexity word complexity
    26. 26. H2: Supported, but...
    27. 27. real effects
    28. 28. women’s contributions are buried
    29. 29. sorting options
    30. 30. gender sorting
    31. 31. design of the system buries women’s contributions
    32. 32. summary: what we know- Language convergence can happen, even without direct interaction;- Women receive lower utility scores, even when they write like the men;- Lower utility scores effectively bury women’s contributions
    33. 33. future work: what we don’t know- Who reads, writes, and votes and why- What’s the relationship between social voting and information bias?- What else could be driving these results?- Effects of the objects being reviewed – e.g., movie genre, popularity
    34. 34. contact usLibby Hemphill Jahna Otterbacherlibby.hemphill@iit.edu jahna.otterbacher@iit.edu Illinois Institute of Technology Chicago, IL
    35. 35. limitations- gender and sex complicated, their reporting is also complicated- don’t know who’s voting- other features (e.g., genre, release date) may also have effects
    36. 36. controlling for movie• Jahna’s 2010 CIKM paper: predicted gender using content, style and metadata (including utility) features. Included movie and movie genre as control variables, but they were not significantly correlated to review utility. Gender was by and large the most significant correlate of utility.• Did a little sanity check by looking up three “chick flicks” (one mentioned in yesterday’s panel) on IMDb: Bridges of Madison County, Sixteen Candles and even Romy and Micheles High School Reunion. If you sort by gender, you can see that the guys are writing the highly- ranked stuff.
    37. 37. q: What’s your role in this community?a: I write reviews. Hopefully some people read them. Iprimarily write them because I enjoy it. If people want toread them, thats a bonus. I read other peoples reviewsbecause Im interested in what theyve got to say. I likethe varied responses people come up with to the samefilm! And I visit the forums to join in debates or toanswer a question in which a user needs help identifyinga movie or an actor. So my role in that sense would be"sharing knowledge".
    38. 38. q: To what extent do you pay attention to howmany people mark your reviews as useful?a: None with obscure movies. how oftenwould anybody even READ the review. (sic)And people might be like me, just curious ifother people had the same opinion. I dontreally go back and see if people found themuseful. I like if I get a personal note about areview, but that happens VERY rarely.
    39. 39. q: To what extent do you feel that yourreviews are valued by the IMDb community?a: To be honest I really have no idea how toanswer that, while I get the odd privatemessage thanking me or criticsing myreviews I dont get that much feedback so inall honesty dont know. I like to think peoplelike them but I cant be sure one way or theother.
    40. 40. HLMcomputation challenges given a dataset thislarge
    41. 41. example reviewsThe next in a long line of "found footage" flicks that have been flooding our cinemasover the last few years, Chronicle breaks free of the usual constraints within thatsub genre to concoct a truly memorable sci-fi thriller. Retracing the steps of threeteenage friends who are gifted with telekinesis after a chance encounter withsomething (intelligently, the movie never stipulates what exactly), the story focuseson the varying paths they take with their new found talent, but not until they havehad some juvenile fun with it first. This is an amazingly accomplished debut featurefor writer-director Josh Trank (who co-penned the script with Max "son of John"Landis); his technical veracity is utterly mind-blowing – especially when you considerthe shoestring funds he had to work with – and his narrative pacing is impeccable.The icing on the already yummy cake is the marvellous CGI that allows ourprotagonists to fly, crush cars and stop baseballs in mid air – all seamlessly andphoto-realistically. Chronicle is a tremendous achievement in low-budget, big-concept filmmaking.