The relationship between gender, linguistic style, and social networks, using a novel corpus of over 14,000 Twitter users. Prior quantitative work on gender often treats it as a female/male binary, but that's problematic at a theoretical level and descriptively inadequate. By clustering Twitter users by the words they use, we find a natural decomposition of the dataset into various styles and topical interests. Many of these clusters end up having strong gender orientations, but they offer a more accurate reflection of the multifaceted nature of gendered language styles. Previous corpus-based work has also had little to say about individuals whose linguistic styles defy population-level gender patterns. To identify such individuals, we train a statistical classifier, and measure the classifier confidence for each individual in the dataset. Examining individuals whose language does not match the classifier's model for their gender, we find that they have social networks that include significantly fewer same-gender social connections, and that in general, social network homophily is correlated with the use of same-gender language markers. I'll hope to persuade you that the combination of computational methods and social theory offers new perspectives on how gender emerges as individuals position themselves relative to audiences, topics, and mainstream gender norms.
How to Teach Pronunciation: Getting StartedJudy Thompson
We asked hundreds of ESL/EFL teachers, "If I could wave a magic wand and fix one thing to help you teach Pronunciation - what would it be?" The number one answer was - How do I start? I created a webinar to answer this great question (link to recording of the webinar http://bit.ly/1SW62M7) and these are the slides from that webinar.
How to Teach Pronunciation: Getting StartedJudy Thompson
We asked hundreds of ESL/EFL teachers, "If I could wave a magic wand and fix one thing to help you teach Pronunciation - what would it be?" The number one answer was - How do I start? I created a webinar to answer this great question (link to recording of the webinar http://bit.ly/1SW62M7) and these are the slides from that webinar.
Speak Up: Encouraging Students to Speak in the ClassroomJulie Hanks
Getting students to speak in class is challenging. Given the opportunity for classroom participation, students may choose not to speak for a host of cultural, social and personal reasons. Having previous experience in Asia, the presenter will discuss these reasons, and provide classroom-tested suggestions on how to get students speaking.
Introduction to Language and Linguistics 005: Morphology & SyntaxMeagan Louie
Introduction to Language and Linguistics 005: Morphology & Syntax - In which we review the notion of morphological restrictions (word-internal distributional patterns), and introduce the idea of syntactic restrictions (word-external distributional patterns). Frame Sentences are introduced as a diagnostic for lexical category, and Phrase Structure Rules are introduced as a way to account for Frame Sentences (i.e., patterns in lexical word order). Hocket's design feature PRODUCTIVITY is discussed, and the difference between the Chomsky-style generative approach and a Skinner-style behaviourist approach mentioned.
Speak Up: Encouraging Students to Speak in the ClassroomJulie Hanks
Getting students to speak in class is challenging. Given the opportunity for classroom participation, students may choose not to speak for a host of cultural, social and personal reasons. Having previous experience in Asia, the presenter will discuss these reasons, and provide classroom-tested suggestions on how to get students speaking.
Introduction to Language and Linguistics 005: Morphology & SyntaxMeagan Louie
Introduction to Language and Linguistics 005: Morphology & Syntax - In which we review the notion of morphological restrictions (word-internal distributional patterns), and introduce the idea of syntactic restrictions (word-external distributional patterns). Frame Sentences are introduced as a diagnostic for lexical category, and Phrase Structure Rules are introduced as a way to account for Frame Sentences (i.e., patterns in lexical word order). Hocket's design feature PRODUCTIVITY is discussed, and the difference between the Chomsky-style generative approach and a Skinner-style behaviourist approach mentioned.
Speaker Profession
Xiomara Mejia, Melanie Sanoff, Claudia Lemanski, Zijie Yuan, Angelique Desjardins, Martin Prokai, Gabrielle Peterson Chidera Udeh
Crystal Akers () - Add your name to the group's slide, and someone should write the research question here.
What is the exact wording of the research question(s) you would like to investigate?
What kind of study do you propose?
Are you analyzing screen data? Conducting interviews? Presenting an experimental survey with a stimulus you've designed?
What is your hypothesis?
How does your study relate to our course?
Consider connections to methods of the study, the linguistic variable in the study (words? phrasing? sounds?) and how the social factor you've selected into (Participants/ Setting / Topic/ Function) relates to linguistic variable.
Crystal Akers () - Use your responses from Collab 3 to ensure some consensus on these questions. Then, each person should post a new slide, following the instructions in Collab 4.
Research Question
How does the profession of a speaker influence their formality on social media platforms?
Crystal Akers () - Just go with it! :)
MELANIE SANOFF () - Is everyone okay with this final question?
ANGELIQUE DESJARDINS () - I like this form of the question!
Hypothesis
If the formality of a speaker on social media is greater, then their profession is more formal than someone with less formal speaking.
This will be seen in the data through:
Number of sentences
Hashtags
Word choice
Emojis
etc.
Crystal Akers () - Everyone can add their own slides showing different screen data accounts and how they are analyzed. See Xio's slides for some examples of how this can work.
Study Type
Analysis of screen data (Twitter Data)
Data Collection
Collect Data from social media
Twitter/Facebook
Collection of:
Number of Abbreviation
Number of Emojis used
Punctuation
Grammar (Errors made)
Complexity of Words chosen
Ex: Would someone with no higher education be able to understand a politician's word choices without the help of a dictionary.
Crystal Akers () - Analyze these sample slides. What linguistic data do you receive from them? Showing a table would be great to let your viewers understand what you're examining (say, # of words/sentence?) and what the exact figures are (how many words/sentence, precisely?)
DataTwitter HandleTotal # sentences in setTotal # of words in setTotal # of hashtags
in setTotal # of emoji in setTotal # of abbreviations in setGrammar Errors in setPunctuation in setCristine Rotenberg @nailogical21903110Governor Phil Murphy @GovMurphy
58700005
Politicians vs Celebrities
Politicians tweets vs Celebrity tweets
iamcardib
@iamcardib
·Mar 27
We are not in great shape .
Bernie Sanders
@BernieSanders
·22h
There are people today who may well have symptoms of coronavirus, but who have to go to work because they don’t have any paid family or medical leave. How does that happen in t.
Speaker Profession
Xiomara Mejia, Melanie Sanoff, Claudia Lemanski, Zijie Yuan, Angelique Desjardins, Martin Prokai, Gabrielle Peterson Chidera Udeh
Crystal Akers () - Add your name to the group's slide, and someone should write the research question here.
What is the exact wording of the research question(s) you would like to investigate?
What kind of study do you propose?
Are you analyzing screen data? Conducting interviews? Presenting an experimental survey with a stimulus you've designed?
What is your hypothesis?
How does your study relate to our course?
Consider connections to methods of the study, the linguistic variable in the study (words? phrasing? sounds?) and how the social factor you've selected into (Participants/ Setting / Topic/ Function) relates to linguistic variable.
Crystal Akers () - Use your responses from Collab 3 to ensure some consensus on these questions. Then, each person should post a new slide, following the instructions in Collab 4.
Research Question
How does the profession of a speaker influence their formality on social media platforms?
Crystal Akers () - Just go with it! :)
MELANIE SANOFF () - Is everyone okay with this final question?
ANGELIQUE DESJARDINS () - I like this form of the question!
Hypothesis
If the formality of a speaker on social media is greater, then their profession is more formal than someone with less formal speaking.
This will be seen in the data through:
Number of sentences
Hashtags
Word choice
Emojis
etc.
Crystal Akers () - Everyone can add their own slides showing different screen data accounts and how they are analyzed. See Xio's slides for some examples of how this can work.
Study Type
Analysis of screen data (Twitter Data)
Data Collection
Collect Data from social media
Twitter/Facebook
Collection of:
Number of Abbreviation
Number of Emojis used
Punctuation
Grammar (Errors made)
Complexity of Words chosen
Ex: Would someone with no higher education be able to understand a politician's word choices without the help of a dictionary.
Crystal Akers () - Analyze these sample slides. What linguistic data do you receive from them? Showing a table would be great to let your viewers understand what you're examining (say, # of words/sentence?) and what the exact figures are (how many words/sentence, precisely?)
DataTwitter HandleTotal # sentences in setTotal # of words in setTotal # of hashtags
in setTotal # of emoji in setTotal # of abbreviations in setGrammar Errors in setPunctuation in setCristine Rotenberg @nailogical21903110Governor Phil Murphy @GovMurphy
58700005
Politicians vs Celebrities
Politicians tweets vs Celebrity tweets
iamcardib
@iamcardib
·Mar 27
We are not in great shape .
Bernie Sanders
@BernieSanders
·22h
There are people today who may well have symptoms of coronavirus, but who have to go to work because they don’t have any paid family or medical leave. How does that happen in t.
Reading ResponseBy R.C. Lewontin, Confusions about Human Races.docxsodhi3
Reading Response
By R.C. Lewontin, Confusions about Human Races,Published, Jun 07, 2006, http://raceandgenomics.ssrc.org/Lewontin/
The author used the prose form of writing, in which the article has paragraphs only, but no sections. While there is a flow of ideas, this essay is not easy to read because of the extensive writing, which makes the reader feel overloaded with information. In addition to that, the document does not have any chronology of events, and the author does not state how he intends to write his ideas.
The central claim of this article is the fact that many people confuse human races. He holds that “race” is a biological aspect of variations in the human species, but the world takes it for a social construct of classifying people.
Lewontin holds that “As a biological construct rather than a social one, people cease to see “race” as a significant reality that characterizes the species of humans.”
Keywords – race, genetics, human, biological, and variations
New Vocabulary: Australian Aborigines, Negritos, Inuit, Tay-Sachs disease, Ashkenazi Jew.
Summary: Lewontin’s Confusions about Human Races is a reminder to the people about the concept of human race. Widely touted as a social construct, “race” is a biological concept that outlines the realities that characterize the human species. Lewontin argues that researchers and scholars produce many objective natural divisions confirming that racial categories are representations of genetic differences and not social or historical factors. He uses Leroi Armand Marie’s essay in the Op-Ed section of The New York Times - March 14, 2005, as evidence for his argument. Lewontin adds that Leroi’s work points out the confusion about the factors of racial categorization as well as the recent erroneous deductions about the relevance of such identifications of the race for medical practice.
The author holds the four facts about the variation of humans upon which the world seems to agree are the ultimate evidence for the understanding of race as a biological concept. First, the species of human have immense variations of genetics from one individual to another. Second, the largest chunk of human variation (nearly 85 percent) is a representation of people within local linguistic or national populations. Examples include the French, Mexican, and Japanese. Third, some genetic traits such as skin color, the form of hair, shape of the nose, and some blood proteins like the Rhesus, vary together such that people with dark skin color are also likely to have dark, curled hair, broad noses, and a high likelihood of Rh blood type. Fourth, the genetic differences break down due to rampant migration and intergroup mating; although, it existed in the past, it is now widespread at a high rate.
Questions: Despite being informative, this article poses a few questions in a reader’s mind. What is the solution to the confusion about race? Does it mean that race is only biological and has no relationship to the s ...
Community Teaching Plan Teaching Experience Paper 1Unsatisf.docxdonnajames55
Community Teaching Plan: Teaching Experience Paper
1
Unsatisfactory
0.00%
2
Less than Satisfactory
75.00%
3
Satisfactory
83.00%
4
Good
94.00%
5
Excellent
100.00%
80.0 %Content
30.0 %Comprehensive Summary of Teaching Plan With Epidemiological Rationale for Topic
Summary of community teaching plan is not identified or missing.
Summary of community teaching plan is incomplete.
Summary of community teaching plan is offered but some elements are vague.
Focus of community teaching is clear with a detailed summary of each component. Rationale is not provided.
Focus of community teaching is clear, consistent with Functional Health Patterns (FHP) assessment findings and supported by explanation of epidemiological rationale.
50.0 %Evaluation of Teaching Experience With Discussion of Community Response to Teaching Provided. Areas of Strength and Areas of Improvement Described
Evaluation of teaching experience is omitted or incomplete.
Evaluation of teaching experience is unclear and/or discussion of community response to teaching is missing.
Evaluation of teaching experience is provided with a brief discussion of community response to teaching.
A detailed evaluation of teaching experience with discussion of community response to teaching and areas of strength/improvement is provided.
Comprehensive evaluation of teaching experience with discussion of community response provided along with a detailed description of barriers and strategies to overcome barriers is provided.
15.0 %Organization and Effectiveness
5.0 %Thesis Development and Purpose
Paper lacks any discernible overall purpose or organizing claim.
Thesis is insufficiently developed and/or vague; purpose is not clear.
Thesis is apparent and appropriate to purpose.
Thesis is clear and forecasts the development of the paper. It is descriptive and reflective of the arguments and appropriate to the purpose.
Thesis is comprehensive; contained within the thesis is the essence of the paper. Thesis statement makes the purpose of the paper clear.
5.0 %Paragraph Development and Transitions
Paragraphs and transitions consistently lack unity and coherence. No apparent connections between paragraphs are established. Transitions are inappropriate to purpose and scope. Organization is disjointed.
Some paragraphs and transitions may lack logical progression of ideas, unity, coherence, and/or cohesiveness. Some degree of organization is evident.
Paragraphs are generally competent, but ideas may show some inconsistency in organization and/or in their relationships to each other.
A logical progression of ideas between paragraphs is apparent. Paragraphs exhibit a unity, coherence, and cohesiveness. Topic sentences and concluding remarks are appropriate to purpose.
There is a sophisticated construction of paragraphs and transitions. Ideas progress and relate to each other. Paragraph and transition construction guide the reader. Paragraph structure is seamless.
5.0 %Mechanics of Writing (includes spelling.
Presenting Diverse Political Opinions: How and How Much (CHI 2010)Sean Munson
Is a polarized society inevitable, where people choose to be exposed to only political news and commentary that reinforces their existing viewpoints? We examine the relationship between the numbers of supporting and challenging items in a collection of political opinion items and readers’ satisfaction, and then evaluate whether simple presentation techniques such as highlighting agreeable items or showing them first can increase satisfaction when fewer agreeable items are present. We find individual differences: some people are diversity-seeking while others are challenge-averse. For challenge-averse readers, highlighting appears to make satisfaction with sets of mostly agreeable items more extreme, but does not increase satisfaction overall, and sorting agreeable content first appears to decrease satisfaction rather than increasing it. These findings have important implications for builders of websites that aggregate content reflecting different positions.
My BookshelfTOCAnnotation menuDownloads PrintSearchProfileHelp5..docxroushhsiu
My BookshelfTOC/Annotation menuDownloads PrintSearchProfileHelp
5.2 Schemas and Scripts
Previous sectionNext section 5.2 Schemas and Scripts
Our automatic system allows us to make shortcuts and come to conclusions without
taxing the conscious system (Shah & Oppenheimer, 2008). In fact, when our resources are depleted we are more likely to use the shortcuts offered by the automatic
system (Masicampo & Baumeister, 2008). The automatic system has two ways of doing this; one focuses on things like objects or people, while the other focuses on events, what they include, and how they are sequenced.
SchemasFigure 5.1: Schemas
Your schema for a baseball game may include a baseball diamond, a salute to the American flag, and peanuts.
Dorling Kindersley RF/Thinkstock, iStockphoto/Thinkstock, iStockphoto/Thinkstock
Chapter 2 introduced the idea of schemas as knowledge structures that organize what we know and that can affect how we process information. Self-schemas are knowledge structures about the self, but we can have schemas about many other things in our world, such as animals, objects, places, and concepts (see Figure 5.1). When we are making judgments, schemas may affect those judgments. For example, a boss might have a schema about an employee as a good, reliable worker. If that employee is late one day, the boss makes a different judgment about that employee than she would if the boss had a schema about that employee as lazy and irresponsible. Because of the positive schema about her employee, the boss might also quickly remember the employee's contributions to past projects, eventually concluding that the employee had a good reason to be late. While schemas can help us remember things by organizing them into preconceived structures, they may also create false memories for us (Lampinen, Copeland, & Neuschatz, 2001). If you were to sit in a professor's office for several minutes and then, outside of the office hours later were asked what you saw in that office, your schema could help you answer. You expect to see bookshelves with books, a desk, a computer, a stapler, and some pens in a professor's office. As you remember what was in the office, your existing schema might help you remember that you saw a bookshelf. But the schema may lead you to remember something that was not there. If you expected to see a stapler, you might report that a stapler was there, even if it was not.
Schemas
00:00
00:00
How schemas influence behavior.
Critical Thinking QuestionsWhy are schemas considered a fundamental part of social psychology? How does a victim's schema put people at a higher risk of being victimized?
Schemas can also help us remember items because they violate a schema. If you were to see a stuffed teddy bear in a professor's office, you might remember and recall it because it was outside of your typical professor's-office schema. This type of effect may have serious
consequences when we examine the role of schema ...
FREE 15 Argumentative Essay Samples in PDF MS Word. Argumentative Essay Examples, Structure amp; Topics Pro Essay Help. Argumentative Essay Topics for College Assignments - Blog BuyEssayClub.com. examples of argument essays Essay examples, Argumentative essay .... Argumentative Essay And Examples. Argumentative essay examples Argumentative essay, Essay writing, Best .... Argumentative Essay. What Is an Argumentative Essay? Simple Examples To Guide You .... How to Write an Argumentative Essay Samples and Topics. Argumentative Essay Example. Expository essay: Argumentative essay practice. Free Argumentative Essays Samples. FREE 9 Argumentative Essay Samples in PDF. FREE 16 Argumentative Writing Samples amp; Templates in PDF MS Word. Argumentative Essay.docx Higher Education Government Free 30-day .... Sample Essay Argumentative Writing Sample. sample-argumentative-essay.pdf DocDroid. 017 Argumentative Essay Examples High School Printables Corner Samples .... Free Argumentative Essay Samples. 005 Argumentative Essay Sample Research Paper Museumlegs. Definition Essay: Samples of argumentative essay writing. 8 Argumentative Essay Examples. Argumentative essay example short Truth or Consequences .... Sample Research Argumentative Essay - How to create a Research .... 020 Essay Example Sample Argument Thatsnotus. PPT - THE ARGUMENTATIVE ESSAY PowerPoint Presentation, free download .... How to structure an argumentative essay. Examples of best topic and .... Sample Argumentative Essay. Check my Essay: Argumentative essay writing examples. 013 Argumentative Essays Examples Brilliant Ideas Of How To Write An ... Samples Of An Argumentative Essay Samples Of An Argumentative Essay
Variation in speech tempo: Capt. Kirk, Mr. Spock, and all of us in betweenTyler Schnoebelen
Kendall (2009) shows that speech rate correlates to region, ethnicity, gender, and age. Beyond average rates, acceleration and deceleration matter. Psychologists and musicologists link tempo not only to demographic categories, but to emotions and personality-types, too.
An analysis of five Star Trek episodes shows how differences in tempo match differences in characters, from the highly variable tempos of a passionate Captain Kirk to the measured, burst-free speech of the emotionless Mr. Spock. The actors deploy tempo stylistically, creating emotions and personalities that audiences understand.
To calculate evenness and irregularity in tempo, I adapt measures of burstiness that network traffic engineering uses for packets traveling across the Internet: computing the variance in time between syllable nuclei in an utterance, then dividing the variance by 0.5*the number of syllables. The bigger the ratio, the more it is characterized by clusters.
Kirk’s burstiness differs significantly from his crew: at least four times greater than all the others at their burstiest. Everyone’s bursts correlate to emotional “hot spots”—areas of increased involvement (Çetin and Shriberg 2006).
I demonstrate that meanings of tempo are structured by two themes: (i) arousal (“action readiness”) and (ii) ideologies about time. These emerge not just in the Star Trek data but from building
indexical fields for “fast talk” and “slow talk.” In the spirit of Eckert (2008), the fields begin with proven correlations. I also develop a rapid survey methodology—results from 50 participants chart a constellation of ideological meanings that describe who talks fast/slow and when.
This paper differs from most work on social meaning by focusing on a suprasegmental aspect of speech. It also draws upon psychology, anthropology, musicology, and computer science. Its use of performances distills stylistic tempo from reflexive, cognitive effects, offering insights that assist our understanding of how tempo gets used in naturally-occurring speech.
"Data is the new water in the digital age"
Anthony (Tony) Nolan OAM, Anthony (Tony) Nolan OAM, Lead Data Scientist, G3N1U5 Pty Ltd, presented a summary of his research as part of the SMART Seminar Series on 6 June 2016.
For more information, visit the event page at: http://smart.uow.edu.au/events/UOW214302.html.
Based on a talk by Carol Lethaby at TESOL, 2017 Seattle.
Some argue that girls and boys learn language differently. Using classroom video and the concepts of 'priming' and 'stereotype threat', the presenter asserts that education, not hardwiring, is what ensures that both sexes flourish when learning language. Teaching ideas to combat sexism and promote success with all children are presented.
Obtaining real meaning from your enterprise social measurement (with a few su...Cai Kjaer
Obtaining real meaning from your enterprise social measurement (with a few surprises along the way!). This was presented at the DEX2018 conference in Sydney in June 2018 by Cai Kjaer from SWOOP Analytics.
Iulia Pasov, Sixt. Trends in sentiment analysis. The entire history from rule...IT Arena
Iulia Pasov is a senior Data Scientist working for Sixt SE, as well as a PhD student in Artificial Intelligence and Psychology and a WiDS Ambassador. As a Data Scientist, Iulia focuses on building AI-based services meant to optimize car rental processes, as well as pipelines for automatic training and deploying of machine learning models. For her studies, she searches ways to improve learning in online knowledge building communities with the use of artificial intelligence.
Speech Overview:
Sentiment analysis is one of the most known sub-domains of Natural Language Processing (NLP), especially used in the classification of feedback messages. This talk will condense over 15 years of research on different approaches in sentiment analysis, as they evolved during time. The audience will be guided through the advantages and disadvantages of each method, in order to understand how to approach the topic given their needs.
How to Start a Compare and Contrast Essay?. A-Z Guide for Writing a Compare and Contrast Essay. 022 Compare And Contrast Essay Outline Template Printables Corners .... 014 Essay Example Compare Contrast Essays ~ Thatsnotus. compare and contrast essay | Nature | Free 30-day Trial | Scribd. Compare and contrast essay examples college vs high school - Compare .... Compare and Contrast Essay Template by Becca McCuistion | TpT. Strong Compare and Contrast Essay Examples.
Training and DevelopmentFinal ProjectNow its your turn! Below.docxTakishaPeck109
Training and Development
Final Project
Now it's your turn! Below is all the information given on a training program needed, called Effective Communication. You are a trainer in the given situation. Submit all of the following:
1.
Training Needs Assessment (refer to previous assignment DST Systems for assessment template)
2.
Written Paper
3.
Powerpoint Presentation
Instructions for Paper
·
Write at least 500 words (approximately 2 pages) using Microsoft Word in APA style, see example below.
·
Use font size 12 and 1” margins.
·
Include cover page and reference page.
·
At least 80% of your paper must be original content/writing.
·
No more than 20% of your content/information may come from references.
·
Use at least three references from outside the course material, one reference must be from EBSCOhost. Text book, lectures, and other materials in the course may be used, but are not counted toward the three reference requirement.
·
Cite all reference material (data, dates, graphs, quotes, paraphrased words, values, etc.) in the paper and list on a reference page in APA style.
References must come from sources such as, scholarly journals found in EBSCOhost, CNN, online newspapers such as, The Wall Street Journal, government websites, etc. Sources such as, Wikis, Yahoo Answers, eHow, blogs, etc. are not acceptable for academic writing.
Instructions for PowerPoint Presentation
Create a PowerPoint presentation and record yourself presenting the response to the assignment. The presentations should be a minimum of six minutes in length and include at least 10 slides.
The requirements below must be met for your presentation to be accepted and graded:
·
Design and format each slide for a presentation, see example below.
·
Include a cover slide and reference slide (these slides do not count toward the 10 slide requirement).
·
Use at least three references from outside the course material, preferably from EBSCOhost. Text book, lectures, and other materials in the course may be used, but are not counted toward the three reference requirement.
·
Identify sources on slides that contain reference material (data, dates, graphs, quotes, paraphrased words, values, etc.) and list them on a reference slide.
References must come from sources such as, scholarly journals found in EBSCOhost or on Google Scholar, government websites and publications, reputable news media (e.g. CNN, The Wall Street Journal, New York Times) websites and publications, etc. Sources such as, Wikis, Yahoo Answers, eHow, blogs, etc. are not acceptable for academic writing.
A detailed explanation of how to cite a source using APA can be found here (
link
).
Download a PowerPoint example
here
Situation:
Tim Smith the IT manager comes to you and says "My project coordinators are in a slump; they just are not producing their usual caliber of work. I need to find out what the problem is. No one on the project team knows what is going on. The c.
Similar to Gender and language (linguistics, social network theory, Twitter!) (20)
Outside of emoji researchers, lots of people still forecast disaster or dream of universal communication even if most of us are confident that neither is nigh. Despite our protests, emoji inspire visions of apocalypse and utopia.
As with many linguistic resources (sounds, words, syntax), people use emoji to grind all sorts of axes. For example, people who say that women use more emoji than men are usually making some point that the data don't support. The first step in such an analysis is to ignore or discount the fact that, say, Snoop Dogg and Kyle MacLachlan are among the biggest emoji users in the world.
In this talk, I'll demonstrate how ideologies of emoji work themselves out across 870 journalists that political scientists have separately scored as liberal, conservative, or centrist. This lets us compare objective vs. subjective stances and inverts the idea that gender explains emoji to show how it is that emoji are a way that people "do" gender differently based on their political commitments.
We aren’t surprised by facial recognition at security checkpoints. But how do you feel about face-scanning toilet roll dispensers? What if they don’t just find criminals but try to detect “criminality”? Laws and policies almost always lag technology so data scientists and machine learning experts are among the first line of ethical defense. The argument in this talk is that to be ethical, any system that classifies human beings has to consider the goals of the people affected by the system, not just the builders’ goals. This is not particularly convenient, but there are concrete ways to put goal-oriented design into practice. Doing so puts us in a better position to practice ethical behavior and attempt to address problems of power and the reproduction of inequality.
Emotion detection: an overview and some new ways forward
In this presentation, you’ll learn how computational linguists, phoneticians, and psychologists have approached emotion detection. You’ll learn about measurements and get a summary of the cues that seem to matter most.
The vast majority of research has used actors performing prototypical emotions like anger, sadness, joy, fear, and disgust (“read the alphabet angrily!”). In real-life, emotions are less extreme and more mixed. Studies of natural speech are still a bit hard to come by, but have been increasing in recent years. We’ll focus on these methods and results, which transfer more easily to real-world applications.
Most descriptions of endangered languages focus on forms and structure: what are the vowels? How do you construct the passive? Imagine focusing on how linguistic resources are used emotionally — the creation of affective grammars. What sort of phenomena might we look for and what methods would we use? As a starting off point, I'll discuss my work on Shabo, an endangered isolate spoken in the coffee-growing mountains of Ethiopia.
http://linguistics.berkeley.edu/~fforum/2011-fall.php
True language universals can be hard to find but two of the most solid are (1) languages change and (2) people are really good at adapting to what’s handy. We’ll explore the ways emoji are changing, ways they haven’t, and where to look for hot spots of innovation.
Computing with Affective Lexicons: Computational Linguistics Tutorial with Da...Tyler Schnoebelen
For the Bay Area NLP Meetup (Natural Language Processing)
This talk is a tutorial summarizing useful methods for using dictionaries and related lexical features to compute affective and social meaning. I’ll define different kinds of social and affective meaning, introduce a number of useful dictionaries, and then give examples from domains like analyzing restaurant reviews and menus, predicting stocks,and detecting interpersonal style in dating.
Dan Jurafsky is a Professor at Stanford University and a MacArthur Fellow. NLP people will know him best as the author of "Speech and Language Processing", with James Martin.
Website: https://web.stanford.edu/~jurafsky/
Crowdsourcing big data_industry_jun-25-2015_for_slideshareTyler Schnoebelen
A presentation to government officials doing crowdsourcing and citizen science. What can machine learning techniques and industry use cases do to help get the most out of data (and big data).
What are "lexical resources" that can go into defining words and phrases? Visualizations and resources for studying language. (Presentation given at Dictionary.com)
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Gender and language (linguistics, social network theory, Twitter!)
1. Gender, language and Twitter:
Social theory and computational methods
Tyler Schnoebelen (including work with
David Bamman and Jacob Eisenstein)
Tweet this talk!
@Tschnoebelen
2. Welcome to the slide-u-ment
• Hi, you may want to check out the “Notes”
fields for additional context.
4. At its most basic
• Assumption 1: Men and women use different
vocabularies
– Hypothesis I: Computational methods can cut through
noise and predict speaker gender based on the words
they use
• Assumption 2: Social networks are typically
“homophilous” (birds of a feather flock together)
– Hypothesis II: Adding the gender make-up of a user’s
social network should get even better prediction
5.
6. Let’s say we can predict gender
• So what?
• Does it license us to connect words/word
groups to the social category in question?
• This assumes that gender is
– Stable
– The primary driving force
7. Our actual goal
• Problematize gender prediction as a task
– Define a system where we could just “stop” and
call it good
– But NOT ACTUALLY STOP
• Demonstrate that simple gender binaries
aren’t actually descriptively accurate
• Show ways to combine social theory and
computational methods that expand the
questions on both sides
10. Typical findings
• Women use standard variables
more often than men.
– In fact, early dialectologists
ignored women completely
because they wanted
“NORMS”—non-mobile, older,
rural male speakers, seen as
preserving the purest regional
(non-standard) forms
• See Chambers and Trudgill
(1980).
– Did they do it for prestige (to
acquire social capital)?
– To avoid losing status?
– Are women actually creating
norms, not following them?
11. Computational/corpus work
• People are fascinated by gender differences
• In order to get statistical significance, you
have to have enough data where you can
detect a signal
• In the past, this has led researchers to roll up
words into word classes
12. The most common distinctions
• Men use informative language
– Prepositions (to), attributive adjectives (fat),
higher word lengths (gargantuan)
• Women use involved language
– First and second person pronouns (you), present
tense verbs (goes), contractions (don’t)
• (Argamon, Koppel, Fine, & Shimoni, 2003; Herring & Paolillo, 2006b; Schler,
Koppel, Argamon, & Pennebaker, 2006…they are working off of dimensions in
Biber 1995 and Chafe 1982)
13. Or “contextuality”
• Men are formal and explicit
– Nouns (floor), adjectives (big), prepositions (to), articles
(the)
• Women are deictic and contextual
– Pronouns (you), verbs (run), adverbs (happily),
interjections (oh!)
• “Contextuality” decreases when an unambiguous
understanding is more important or difficult—when
people are physically or socially farther away
• (Mukherjee & Liu, 2010; Nowson, Oberlander, & Gill, 2005 building off of
Heylighen and Dewaele 2002)
18. Our approach also lumps
• It’s just at a lower level
– instead of “nouns” or “blog
words”
– we assume all usages of a
unigram are identical
• Lumping itself isn’t a
problem. In fact, you have
to.
– But ideologies are going to
structure your lumpings
and divisions, so watch
out!
20. Data
• Public Twitter messages in same-gender and cross-gender
social networks
– Word frequencies (unigrams)
– Gender (induced from first names using the Social Security
Administration data)
• 14,464 Twitter users (56% male)
– Geolocated in the US
– Must use 50 of top 1,000 most frequent words
– Between 4 and 100 ties (at least 2 “mutual @’s” separated by 14
days)
• Women have 58% female friends
• Men have 67% male friends
• 9.2M tweets, Jan-Jun 2011
21. Twitter has a pretty good swath (Pew)
• Nearly identical usage among women and
men:
– 15% of female internet users are on Twitter
– 14% of male internet users
• High usage among non-Hispanic Blacks (28%)
• Even distribution across income and education
levels
• Higher usage among young adults (26% for
ages 18-29, 4% for ages 65+)
22. First names are highly gendered
100
97
86
15
0
0
3
14
85
100
0 20 40 60 80 100
Matt
Alex
Chris
Kelly
Sarah
% female
% male
95% of users have a name 85% associated with one gender
Median user name is 99.6% associated with its majority gender
23. First step: gender prediction
• Logistic regression:
– Will you have a heart attack Y/N?
– Will you vote for X or Y?
– Will your Brazilian Portuguese nouns and modifiers
agree in number?
• Logistic regression is the statistical technique at
the core of variable rule analysis (Tagliamonte
2006)
• But we’re going to reverse the direction for what
sociolinguists typically do
24. First step: gender prediction
• The relevant linguistic variables aren’t known
beforehand
• So the dependent variable—the thing we are
trying to predict—is author gender
• The independent variables are the 10,000
most frequent lexical items in the tweets
25. Preventing overfitting
• This involves estimating a lot of parameters.
• Which raises the risk of overfitting: learning
parameter values that perfectly describe the
training data but won’t generalize to new data
26. Why regularize?
Regularization dampens the effect of an
individual variable (Hastie et al 2009).
A single regularization parameter controls the
tradeoff between perfectly describing the
training data and generalizing to unseen data.
27. Evaluating accuracy
• We use the typical method of cross-validation.
1. Randomly divide the full dataset into 10 parts.
2. Train on 80% of the data
3. Use 10% of the data to tune the regularization
parameter
4. Now, use the model to predict the other 10%
5. Compare the predictions to what really happened
• Do this 10 times and take the average.
28. Gender prediction results
• State-of-the-art accuracy: 88.0%
– Lexical features strongly predict gender
– Ignoring syntax (treating tweets as “bags of
words”) does pretty good
29. Previous literature In our data
Pronouns F F
Emotion terms F F
Family terms F Mixed results
"Blog words" (lol, omg) F F
Conjunctions F F (weakly)
Articles M No results
Numbers M M
Quantifiers M No results
Technology words M M
Prepositions Mixed results F (weakly)
Swear words Mixed results M
Assent Mixed results Mixed results
Negation Mixed results Mixed results
Emoticons Mixed results F
Hesitation markers Mixed results F
Top 500 markers for each gender
30. At a corpus level, women use more non-dictionary words and men use more
named entities. In a moment we’ll ask how universal this is.
Hand classification of most frequent
10k words (90.0% agreement)
Female authors Male authors
Common words in a standard dictionary 74.2% 74.9%
Punctuation 14.6% 14.2%
Non-standard, unpronounceable words (e.g.,
:), lmao)
4.28% 2.99%
Non-standard, pronounceable words (e.g., luv) 3.55% 3.35%
Named entities 1.94% 2.51%
Numbers 0.83% 0.99%
Taboo words 0.47% 0.69%
Hashtags 0.16% 0.18%
31. Involvement
• Using traditional definitions, it looks as if our
data confirms:
– men as more informational (all those named
entities)
– women as more interactive/involved (pronouns,
emoticons, etc.)
• Note that most of the named entities for the
men are sports figures and teams
34. Clustering without regard to gender
• We apply probabilistic clustering in order to
group authors who are linguistically similar
• Each author is represented as a list of word
counts across the 10,000 words used in the
classification experiment
35. Clustering! (Hastie et al 2009)
Easy example: 2 clusters “Expectation Maximization”
1. Randomly assign all authors to
one of 20 clusters
2. Calculate the center of the cluster
from the average word counts of
all authors put in it
3. Assign each author to the nearest
cluster, based on the distance
between their word counts and
the average word counts of the
cluster center
4. Keep iterating through this moving
from random clustering to
meaningful clusters
5. Repeat steps 1-4 (25 times)
6. Pick the best
36. Some definitions
• Style: combinations of linguistic resources
• Cluster: a group of authors who use a
particular style
• Social network: each author has a social
network made up of people who they send
AND receive messages from
• An author’s social network does not have to
be a part of that author’s cluster
38. Looks like “women are trying to
destroy the English language”
Female authors Male authors
Common words in a standard dictionary 74.2% 74.9%
Punctuation 14.6% 14.2%
Non-standard, unpronounceable words (e.g.,
:), lmao)
4.28% 2.99%
Non-standard, pronounceable words (e.g., luv) 3.55% 3.35%
Named entities 1.94% 2.51%
Numbers 0.83% 0.99%
Taboo words 0.47% 0.69%
Hashtags 0.16% 0.18%
39. Clusters that are majority female
• At the population level, women use many non-
dictionary words.
• But there are clusters of (mostly) women who
actually use fewer words like lol, nah, haha than men
do
Size % fem Top words
c14 1,345 89.60%
hubs blogged bloggers giveaway @klout recipe fabric
recipes blogging tweetup
c6 661 80.00% authors pokemon hubs xd author arc xxx ^_^ bloggers d:
c4 1,376 63.00%
&& hipster #idol #photo #lessambitiousmovies hipsters
#americanidol #oscars totes #goldenglobes
40. Consider xo
• A lot more women use xo than
men
– 11% of all women
– 2.5% of all men
• But that means that 89% of
women aren’t using it at all.
• People who use xo are three
times more likely to use ttyl (‘talk
to you later’)
– The style is more commonly
adopted by women
– But there’s other stuff going on
here: age, job, etc.
– It’s not clear that gender is even
the most important, it’s just that
we’re starting with gender-colored
glasses
43. Group Gender Activity/social role Interactions Geography
Shit Guys Don't Say Out Loud
Shit College Freshmen Say
Shit Girlfriends Say
Shit Asian Dads Say
Shit Redneck Guys Say
Shit Girls Say to Gay Guys Say
Shit Black Girls Say Say
Shit Black Guys Say Say
Shit People Say in LA
Shit White Girls Say…to Black Girls
Shit New Yorkers Say
Shit Frat Guys Say
Shit Whipped Guys Say
Shit Guys Don't Say Say
Shit Asian Girls Say
Shit Tumblr Girls Say
Shit Brides Say
Shit Spanish Girls Say
Shit Asian Moms Say
Shit Vegans Say
Shit Hipsters Say
Shit Cyclists Say
Shit Yogis Say
Shit Skiers Say
44. Notice
• That gender wasn’t really limited to the
“gender” column
– “Moms” and “dads” are gendered social roles
• And that the words “guys” and “girls” aren’t
really the same as “male” and “female”
– What are the plausible age ranges and social
styles for “guys” and “girls”?
45. Clusters that are majority male
Size % male Top words
c13 761 89.40%
#nhl #bruins #mlb nhl #knicks qb @darrenrovell inning
boozer jimmer
c10 1,865 85.40%
/cc api ios ui portal developer e3 apple's plugin
developers
c18 623 81.10%
@macmiller niggas flyers cena bosh pacers @wale
bruh melo @fucktyler
c11 432 73.80%
niggas wyd nigga finna shyt lls ctfu #oomf lmaoo
lmaooo
c20 429 72.50%
gop dems senate unions conservative democrats
liberal palin republican republicans
c15 963 65.30%
#photo /cc #fb (@ brewing #sxsw @getglue startup
brewery @foursquare
46. Looks like “men are Twitter-headed
sailor-swearing accountants”
Female authors Male authors
Common words in a standard dictionary 74.2% 74.9%
Punctuation 14.6% 14.2%
Non-standard, unpronounceable words (e.g.,
:), lmao)
4.28% 2.99%
Non-standard, pronounceable words (e.g., luv) 3.55% 3.35%
Named entities 1.94% 2.51%
Numbers 0.83% 0.99%
Taboo words 0.47% 0.69%
Hashtags 0.16% 0.18%
47. Aggregates generally don’t hold
Top words Notes
c13
#nhl #bruins #mlb nhl #knicks qb
@darrenrovell inning boozer jimmer
Few Taboo/Hashes
Lots of Punc
c10
/cc api ios ui portal developer e3 apple's
plugin developers
Few Taboo/Hashes
Lots of Punc
c18
@macmiller niggas flyers cena bosh
pacers @wale bruh melo @fucktyler
c11
niggas wyd nigga finna shyt lls ctfu #oomf
lmaoo lmaooo
Few Dict words,
Lots of unPron and Pron
c20
gop dems senate unions conservative
democrats liberal palin republican
republicans
Few Taboo/Hashes
Lots of Punc
c15
#photo /cc #fb (@ brewing #sxsw
@getglue startup brewery @foursquare
Few Taboo
Lots of Punc
48. Small exceptions
• At the population level, men use many named
entities and numbers
• Clusters use these at various rates, but:
– No female-skewed clusters use them *more* than the
male average
– No male-skewed clusters use them *less* than the
female average
• But again, the other 6 generalizations about
gender we might have made at an aggregate
aren’t supported once we get to clusters
49. Erasure!
• Clusters are highly gendered
• For example, let’s consider
clusters made up of 60% or more
of people of the same gender
– That covers 82.95% of all the
authors
– But what about the 1,242 men
who are part of female-majority
clusters?
– The 1,052 women who are part of
male-majority clusters?
– Are they just noise? Odd-balls? Is
there no structure to what they’re
doing?
– These people are using language to
do identity work, even as they
construct identities at odds with
conventional notions of
masculinity and femininity.
50. Clusters vs. social networks
• The more skewed a cluster is, the more
skewed the social networks of its members
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
3
4
5
6
7
8
9
10
11
13
14
15
16
17
18
19
20
percent male
percentmalefriends
51. Women with female networks use the
most female markers
1 2 3 4 5 6 7 8 9 10
0.20.40.60.81.0
female authors
percent female social network
femalemarkerproportion
52. Men with male networks use the most
male markers
1 2 3 4 5 6 7 8 9 10
0.00.20.40.60.8
male authors
percent male social network
malemarkerproportion
53. Women with male networks use more
male markers (and vice versa)
54. Women with highly female networks
are easier to classify (and vice versa)
55. In other words
• The classifier is picking up on the fact that if you
insist upon a gender binary then people with
same-gender networks use language in a more
“gender-coherent” way.
56. Does social network help prediction?
• 88% accuracy with text alone
– Logistic regression, 10-fold cross-validation
– State-of-the-art accuracy
• Add network information…
– Still 88% accuracy
57. Once we have 1000 words/author,
network info doesn’t help
Words
Accuracy
0.50.60.70.80.91.0
0 10 100 1,000 10,000 all
words plus social network
words only
0.880
58. Wait, why not?
• A new feature is only going to improve classification
accuracy if it adds new information.
• There is strong homophily: 63% of the connections are
between same-gender individuals.
• But language and social network can’t mutually
disambiguate because they aren’t independent views
on gender.
• Individuals who use linguistic resources from “the
other gender” consistently have denser social network
connections to the other gender.
– Performance, style, accommodation
• Gender is not an “A or B” kind of thing
61. Not so simple
• If we want to understand categories, we
should start with people in interactions.
– Counting is great but we have to watch our bins
and investigate them, too.
66. Positioning and stance
• “Stance” is usually seen as an
expression of a speaker’s
relationship to their talk and
their interlocutors
– E.g., Kiesling (2009); Du Bois
(2007); Bednarek (2008)
• But “stance” (and “roles”)
seem static
• I’d like something with more
motion and dynamism
67. Positioning and stance
• “Stance” is usually seen as an
expression of a speaker’s
relationship to their talk and
their interlocutors
– E.g., Kiesling (2009); Du Bois
(2007); Bednarek (2008)
• But “stance” (and “roles”)
seem static
• I’d like something with more
motion and dynamism
• I develop positioning to
connect linguistic forms to
social structures
• (Particularly affect, actually)
70. Positioning in a social grid
• Social structures are
created, maintained,
and changed by
specific interactions
• People enter
interactions already
positioned
• Interactions change
these positions,
people are attentive
to changes
71. Conventions
• Different linguistic
resources come to be
associated with different
positionings
• Distributions of
experiences are usually
maintained
• The maintenance and
disruption of expectations
has (affective)
consequences
73. CHILDES (MacWhinney, 2000)
• 4,676 transcripts of parent-child interactions
– American English
Observed little Expected little O/E
Mothers-to-boys 4,313 4,158 1.037
Fathers-to-boys 1,516 1,381 1.098
Mothers-to-girls 6,312 5,441 1.160
Fathers-to-girls 230 281 0.819
Girls-to-mothers 1,221 1,533 0.796
Girls-to-fathers 4 3 1.482
Boys-to-mothers 875 1,526 0.573
Boys-to-fathers 117 265 0.441
74. Gender and little
• Women tend to use little more—multiple corpora show significant
differences
• But this misses the point
Buckeye
OE
CALLHOME
OE
Female 1.170 1.073
Male 0.855 0.725
75. Add interlocutor gender
CHILDES
Parent-
Child OE
CHILDES
Child-
Parent OE
Buckeye OE
Fisher Am.
Eng. OE
Fisher
Ohioans OE
CALLHOME
OE
Female to
female
1.160 0.796 0.936 1.051 1.160 1.088
Female to
male
1.037 1.482 1.290 0.887 0.771 1.064
Male to
male
1.098 0.441 0.879 1.071 0.830 0.685
Male to
female
0.819 0.573 0.908 0.842 0.836 0.727
76. Gender and topics
• Some topics are more face-threatening than others.
– Face-threatening topics get less little.
• When topic is held constant, men and women mostly have the
same little usage .
– Regardless of the gender of the person they’re talking to.
• But there are some exceptions, which are connected to issues of
masculinity, femininity, and emotional regulation.
– Some examples:
• Generally, people don’t use little to talk about terrorism. EXCEPT women
speaking to women use little to modify emotions (terrified, scared)
• Generally, people DO use little to talk about fitness. EXCEPT men talking to
men. The men talking to women use little to talk about their pudgy, flabby
bodies. The few men talking to men who use little use it to talk about working
out a little harder or putting on a little more muscle mass.
77. ICSI meeting corpus (Janin et al., 2003)
• 75 meetings from Berkeley’s International
Computer Science Institute (2000-2002)
– 3-10 participants (avg of 6)
– 17-103 minutes each (usually an hour)
– 72 hours of data
# speakers
(avg age)
Observed
little
Expected
little
O/E
Undergrad 6 (30 yo) 59 34 1.734
Grad 14 (29 yo) 234 223 1.049
Postdoc 1 (not given) 51 75 0.676
Ph.D. 11 (37 yo) 152 228 0.667
Professor 4 (52 yo) 278 213 1.302
78. Gender, genre, topic, style
• “Different ways of saying things are intended
to signal different ways of being, which
includes different potential things to say.”
(Eckert 2008)
84. Gender is binary only with blinders
• “My mom doesn’t
say that’s lovely or
omg!...”
– “Nevermind that!”
• Problem: Sliding
from predictive
accuracy to causal
stories
• Realistic finding:
There are lots of
ways to do gender
85. Big data, big opportunities
• Big data offers us
the opportunity
to let clusters
emerge (and test
them against our
big bins)
• We can show how
language reflects
and creates the
social worlds we
live in
(Argamon, Koppel, Fine, & Shimoni, 2003; Herring & Paolillo, 2006b; Schler, Koppel, Argamon, & Pennebaker, 2006…they are working off of dimensions in Biber 1995 and Chafe 1982)
(Mukherjee & Liu, 2010; Nowson, Oberlander, & Gill, 2005 building off of Heylighen and Dewaele 2002)
We also ran our work with part-of-speech tagged unigrams for one level less lumping—the results are basically the same but not reported here.
We only select users with first names that occur over 1,000 times in the census data (approximately 9,000 names), the most infrequent of which include Cherylann, Kailin and Zeno.
We further filtered our sample to only those individuals who are actively engaging with their social network. Twitter contains an explicit social network in the links between individuals who have chosen to receive each other’s messages. However, Kwak, Lee, Park, and Moon (2010) found that only 22 percent of such links are reciprocal, and that a small number of hubs account for a high proportion of the total number of links. Instead, we define a social network based on direct, mutual interactions. In Twitter, it is possible to address a public message towards another user by prepending the @ symbol before the recipient’s user name. We build an undirected network of these links. To ensure that the network is mutual and as close of a proxy to a real social network as possible, we form a link between two users only if we observe at least two mentions (one in each direction) separated by at least two weeks. This filters spam accounts, unrequited mentions (e.g., users attempting to attract the attention of celebrities), and one-time conversations. We selected only those users with between four and 100 mutual-mention friends. The upper bound helps avoid ‘broadcast-oriented’ Twitter accounts such as news media, corporations, and celebrities.
--e.g., The Social Security Administration says:
Tyler is a male name 97.36% of the time
Annette is a female name 100% of the time
Robin is female 87.69% of the time
Some names are ambiguous by gender, but in our dataset, such ambiguity is rare: the median user has a name that is 99.6% associated with its majority gender; 95% of all users have name that is at least 85% associated with its majority gender. We assume that users tend to self-report their true name; while this may be largely true on aggregate, there are bound to be exceptions. Our analysis therefore focuses on aggregate trends and not individual case studies. A second potential concern is that social categories are not equally relevant in every utterance. But while this is certainly true in some cases, it is not true on aggregate — otherwise, accurate gender prediction from text would not be possible. Later, we address this issue by analyzing the social behavior of individuals whose language is not easily associated with their gender.
All words were converted to lower-case but no other preprocessing or stopword filtering was performed.
Basically you train on part of the data but hold out part of it to tune the regularization parameter.
Example: If a single word, like indubitably were used three times by men and never by women, an overfit model would have high confidence that anyone who uses indubitably is a man, regardless of other words they use
That would be dumb. So we use regularization.
The accuracy in gender prediction by this method is 88.0%, which is state of the art compared with gender prediction on similar datasets (Burger et al. 2011). While more expressive features might perform better still, the high accuracy of lexical features shows that they capture a great deal of language’s predictive power with regard to gender.
More specifically, we apply the standard machine learning technique of logistic regression (Hastie, Tibshirani, & Friedman, 2009) . The model learns a column vector of weights w to parametrize a conditional distribution over labels (gender) as , where and x represents a column vector of term frequencies. The weights are chosen to maximize the conditional likelihood P(y| x; w) on a training set, using quasi-Newton optimization. To prevent overfitting of the training data, we use standard L2 regularization; this is equivalent to ridge regression in linear regression models. As features, we used a boolean indicator for each of the most frequent 10,000 words in the dataset.
Train a statistical model on part of the data.
Logistic regression (Hastie, Tibshirani, & Friedman, 2009)
Test it on a different part of the data, hiding the gender labels.
10-fold cross-validation: 10 unique training/test splits (so the test is a different 10% of the data)
Not shown:
Clitics: previous lit “F”, our data: weakly “F”
Because the counts are so high, all differences are statistically significant at p < 0.01.
Hand-classified by two authors, disagreements decided by discussion between all three authors.
But categories are never simply descriptive; they are normative statements that draw lines around who is included and excluded (Butler 1990).
Expectation-maximizing (EM) algorithm (Dempster et al., 1977); basically k-means with log-linear distributions (Eisenstein, Ahmed, and Xing, ICML 2011)
25 runs with randomly generated Q(zn=k) and select the iteration with the highest joint likelihood.
Each author is assigned a distribution over clusters ; each cluster has a probability distribution over word counts and a prior strength . In the EM algorithm, these parameter are iteratively updated until convergence. The probability distribution over words uses the Sparse Additive Generative Model (Eisenstein, Ahmed, and Xing 2011), which is especially well suited to high-dimensional data like text. For simplicity, we perform a hard clustering, sometimes known as hard EM. Since the EM algorithm can find only a local optimum, we make 25 runs with randomly-generated initial assignments, and select the run with the highest likelihood.
Each cluster is associated with a probability distribution over text and each author is placed in a cluster with the best probabilistic fit for their language.
The maximum-likelihood solution is the clustering that assigns the greatest probability to all of the observed text.
Because the counts are so high, all differences are statistically significant at p < 0.01.
Hand-classified by two authors, disagreements decided by discussion between all three authors.
That is, we’re comparing these clusters’ rates with the aggregated-men’s rates. We’re reporting the clusters that are significantly different. (In other words, women in these three clusters are using lol-like words significantly less than men-on-a-whole do. If women were really non-standard across the board, we wouldn’t expect any clusters to use less than the aggregated MALE number.)
These are some of the most popular “Shit X Say” videos.
Notice that “Gender” is not limited to the “gender” category—e.g., ”girls” does not include “elderly women” and “Moms” doesn’t really include teenagers (even if they are young mothers).
Because the counts are so high, all differences are statistically significant at p < 0.01.
Hand-classified by two authors, disagreements decided by discussion between all three authors.
I developed my ideas about positioning out of the data, but the metaphor is powerful and after I was mostly done, I found that Rom Harré and colleagues had made their own explorations/elaborations of “positioning”. We took different paths to a fairly similar end point. I’m happy to have my work be considered an extension of his.
“You” and “I” aren’t references to objects independent of time and space
They are momentary status updates (Harré, 1983)
Even when they aren’t explicitly there, you and I are there—our talk relates us to each other
1b:But the structure does impose constraints on interactions (Bourdieu, 1977; Butler, 1999; Giddens 1984)
3: citation (Goffman 1981)
3b: People make use of conversational forms and strategies that are available to them (Harré, 1986; Vygotsky, 1962)
2b: Expectations are maintained
2c: People are enabled and constrained by these expectations
It’s a “female marker” in Twitter.
But categories are never simply descriptive; they are normative statements that draw lines around who is included and excluded (Butler 1990).
And we can’t trust the idea that we’ll just figure out each of the independent parts—if we figure out “woman” and “African American” then we’ll understand “African American women”.
----- Meeting Notes (4/21/14 10:03) -----
Sports teams and gender, Americanness, "doing sports teams” (not “doing gender”)
Online utterances: we can tailor, we can delete, how feed in to this
Tried to take into account time of tweet, predictors of spelling out words
Apply techniques like LDA or hierarchical model, maybe person is drawing from clusters. Gender or cluster. People are distributions over topics.
Fisher topics vs. Twitter topics
Statistican asks about using clusters to predict gender, then realizes I’m moving the goalposts