Gender, Representation and Online Participation: a Quantitative Study

Gender, Representation and Online
Participation:
a Quantitative Study
Dr Andrea Capiluppi
30 Oct 2013
Dept of Information Systems and Computing (DISC)

My research background
• Software engineering
–
–
–
–

Software maintenance & evolution
Software architectures, components & reuse
Effort estimation
Quantitative studies

• Open processes
– Open source products
– Social networks
• Wikipedia
• Q&A sites

The Fastest Q&A Site in the West
• StackOverflow is a “Question & Answer site for
programmers”
– Part of the StackExchange network

• Most questions are answered
– StackOverflow (92.6%)
– Yahoo! Answers (88.2%)
– KiN (~66%)

• Median answer time of only 11 minutes!
Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., & Hartmann, B. (2011, May).
Design lessons from the fastest q&a site in the west. In Proceedings of the SIGCHI
conference on Human factors in computing systems (pp. 2857-2866). ACM.

Game Mechanisms in SO
• SO is based on points
– Reputation points
• Good answer
• Good comment
• Good question
• ...

– Badges
• Popular Question
• Commentator
• Necromancer
• …

– Privileges: more points give access to more features
• Voting
• Commenting
• Editing

How this work started
• Major conference, paper painting the awesomeness
of StackOverflow
Lotufo, R., Passos, L., & Czarnecki, K.
(2012, June). Towards improving bug
tracking systems with game mechanisms.
In Mining Software Repositories (MSR),
2012 9th IEEE Working Conference on
(pp. 2-11). IEEE.

How this work started
• Paper was well received
• Questions from the audience:
– is SO attracting a male-only crowd?

• Wider questions:
– Are prizes, badges, reputation creating an unbalanced
participation?
– Is “gaming” lethal for a social network? Making it less
sustainable?

A bit of a touchy topic...

Regarding the FLOSS community as a
whole, have you ever observed
discriminatory behaviour against women?

FLOSSPOLS
Deliverable D16
Gender: Integrated
Report of Findings.
http://www.flosspols.o
rg/deliverables/D16H
TML/FLOSSPOLSD16Gender_Integrated_R
eport_of_Findings.ht
m, 2006.

Demoted skills
• Online status and reputation: 'pro' and 'rookie'
– Technical skills: coding, debugging, etc.
– Non-technical skills: usability, web design, etc.

• (…) the skill of web design was demoted to a ‘nontechnical’ status as it became a way in which women
described and approached their work [Kotamraju
2003]

Kotamraju, N. 2003. Art versus Codep: The Gendered
Evolution of Web Design df Skills. In Howard, P. and S. Jones
(eds) Society Online: The Internet in Context. London: Sage.

Aim of the study
• Provide quantifiable evidence of gender
participation and engagement
– Is gender ratio unbalanced?
– Is gender engagement unbalanced?

• Data sampling: Q&A sites
– StackOverflow
– Wordpress
– Drupal

1) What
is your
gender?
2) What
do you do
on a Q&A
site?
/ SET / W&I

14/11/13

PAGE 12

Research questions:
• RQ1: What are the challenges with identifying gender
in online communities?
• RQ2: What is the rate of participation by women in
online communities?
• RQ3: What is the level of engagement by women in
online communities?
… (trying to) avoid moralistic messages

Empirical approach
• Data mining/Name extraction
• Gender resolution
• Detection of activity on
– StackOverflow
– Drupal
– WordPress

• Statistical comparison between gender

Data and name extraction
• StackOverflow public data dump
– 1,078,708 registered users
– Too much noise to automatically assign gender
– Random sampling
• 2% margin error
• 99% confidence interval
• Subset of 4,144 SO users
• Manual gender resolution

Data and name extraction II
• Drupal and WordPress
mailing lists
– Both separate Q&A into
various sub-lists
• Consulting
• Development
• Support
• …

– Name, Surname, email
address, text of email,
<<in_response_to>> tag
– All messages & authors
analysed
– Manual gender resolution

What is resolution
Gender your gender?

What is resolution II
Gender your gender?

?

What is resolution III
Gender your gender?

What is resolution IV
Gender your gender?

Name +
Location =
Gender

Lonzo ⇒ Alonzo

w35l3y ⇒ wesley

Name +
Location =
Gender

14/11/13
P
A
S
G
E
E
T
24
W
&

Heuristics:
title + first h1
<title>Ben Kamens</title>
…
<h1>We’re willing
to be embarrassed about
what we
<em>haven’t</em>
done…</h1>

Ben Kamens We’re willing to
be embarrassed about what we
haven’t done…
Stanford Named
Entity Tagger
<PERSON>Ben
Kamens</PERSON> We’re
willing to be embarrassed
about what we haven’t done…

Automatic gender resolution
• Python tool developed

Name,
Country
Gender {masculine,
feminine, x}

14/11/13
P
A
S
Quality of gender resolution: Survey
G
E
E
T
26
W
SelfAs inferred Total
&
identification

M

M
F

F ?

60
2

3 43
5 4

+ avatars,
other social
media sites
(manually)

106
11
SelfAs inferred Total
identification M F ?
M
F

90
2

3 13
9 0

106
11

Hypothesis testing

• Three-way testing {masculine, feminine, x}
• Mann-Whitney test (skewness of data)

14/11/13
P
A
S
G
E
E
T
28
W
&

2,296

291

1,557

3,043

282

286

2,879

328

135

sample

14/11/13
P
A
S
G
E
E
T
29
W
&

2,296

291

1,557

3,043

282

286

2,879

328

135

sample

7-10% women as opposed to
1-5% for Open Source and
up to 28% for proprietary

14/11/13
P
A
S
G
E
E
T
30
W
&

2,296

291

1,557

3,043

282

286

2,879

328

135

sample

7-10% on different mailing lists
more on “use technology”
less on “design technology”

14/11/13
P
A
S
G
E
E
T
31
W
&

2,296

291

1557

3,043

282

286

2,879

328

135

sample

It is easy to remain anonymous on SO and
participants use this opportunity (37.5%)

14/11/13
P
A
S
G
E
E
T
32
W
&

sample

No significant
differences in
#questions, #answers,
length of engagement

Affects eng’t
for “design
tech.” lists

14/11/13
P
A
S
G
E
E
T
33
W
&

sample

Engage
Ask more
for longer
questions
No diff in #answers

Women can
contribute to SO
but choose not to!

14/11/13
P
A
S
G
E
E
T
34
W • [Gneezy,
&

Why?

Niederle, Rustichini 2003]: women are less
effective in mixed-gender competitive environments

• [Niederle, Vesterlund 2007]: women shy away from
competition and men embrace it
• To retain women we need different gamification
techniques

14/11/13
P
A
S
Threats to validity
G
E
E
T
35
• Gender inference:
W
&
• Automated: Imprecise

tooling
• Manual: Errare humanum est

• Gender swapping
• Images of other people as avatars
• Celebrities, children, porn stars…

14/11/13
P
A
S
G
E
E
T
36
•
W
&

Future work…
Roles: coders, translators, UI designers
– Similar to diff mailing lists in Drupal/WordPress
– Activity (commits) rather than discussion

• Output: code, bugs, …

14/11/13
P
A
S
G
E
E
T
37
W
&

Name +
Location =
Gender

Questions?
Vasilescu, B., Capiluppi, A., Serebrenik A.
(2012): Gender, Representation and Online
Participation: A Quantitative Study of
StackOverflow Social Informatics
(SocialInformatics), 2012 International
Conference on, p. 332-338
●

(2013): Men at work: the StackOverflow case Tiny
Transactions on Computer Science, 2
●

(2013): Gender, Representation and Online
Participation: A Quantitative Study, Interacting
with Computers 2013; doi: 10.1093/iwc/iwt047
●

Gender, Representation and Online Participation: a Quantitative Study

Recommended

Recommended

More Related Content

Similar to Gender, Representation and Online Participation: a Quantitative Study

Similar to Gender, Representation and Online Participation: a Quantitative Study (20)

Recently uploaded

Recently uploaded (20)

Gender, Representation and Online Participation: a Quantitative Study

Editor's Notes