SlideShare a Scribd company logo
Gender, Representation and Online
Participation:
a Quantitative Study
Dr Andrea Capiluppi
30 Oct 2013
Dept of Information Systems and Computing (DISC)
My research background
• Software engineering
–
–
–
–

Software maintenance & evolution
Software architectures, components & reuse
Effort estimation
Quantitative studies

• Open processes
– Open source products
– Social networks
• Wikipedia
• Q&A sites
The Fastest Q&A Site in the West
• StackOverflow is a “Question & Answer site for
programmers”
– Part of the StackExchange network

• Most questions are answered
– StackOverflow (92.6%)
– Yahoo! Answers (88.2%)
– KiN (~66%)

• Median answer time of only 11 minutes!
Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., & Hartmann, B. (2011, May).
Design lessons from the fastest q&a site in the west. In Proceedings of the SIGCHI
conference on Human factors in computing systems (pp. 2857-2866). ACM.
Game Mechanisms in SO
• SO is based on points
– Reputation points
• Good answer
• Good comment
• Good question
• ...

– Badges
• Popular Question
• Commentator
• Necromancer
• …

– Privileges: more points give access to more features
• Voting
• Commenting
• Editing
How this work started
• Major conference, paper painting the awesomeness
of StackOverflow
Lotufo, R., Passos, L., & Czarnecki, K.
(2012, June). Towards improving bug
tracking systems with game mechanisms.
In Mining Software Repositories (MSR),
2012 9th IEEE Working Conference on
(pp. 2-11). IEEE.
How this work started
• Paper was well received
• Questions from the audience:
– is SO attracting a male-only crowd?

• Wider questions:
– Are prizes, badges, reputation creating an unbalanced
participation?
– Is “gaming” lethal for a social network? Making it less
sustainable?
Anecdotal evidence...
A bit of a touchy topic...

Regarding the FLOSS community as a
whole, have you ever observed
discriminatory behaviour against women?

FLOSSPOLS
Deliverable D16
Gender: Integrated
Report of Findings.
http://www.flosspols.o
rg/deliverables/D16H
TML/FLOSSPOLSD16Gender_Integrated_R
eport_of_Findings.ht
m, 2006.
Demoted skills
• Online status and reputation: 'pro' and 'rookie'
– Technical skills: coding, debugging, etc.
– Non-technical skills: usability, web design, etc.

• (…) the skill of web design was demoted to a ‘nontechnical’ status as it became a way in which women
described and approached their work [Kotamraju
2003]

Kotamraju, N. 2003. Art versus Codep: The Gendered
Evolution of Web Design df Skills. In Howard, P. and S. Jones
(eds) Society Online: The Internet in Context. London: Sage.
Recognised widespread issue
Aim of the study
• Provide quantifiable evidence of gender
participation and engagement
– Is gender ratio unbalanced?
– Is gender engagement unbalanced?

• Data sampling: Q&A sites
– StackOverflow
– Wordpress
– Drupal
1) What
is your
gender?
2) What
do you do
on a Q&A
site?
/ SET / W&I

14/11/13

PAGE 12
Research questions:
• RQ1: What are the challenges with identifying gender
in online communities?
• RQ2: What is the rate of participation by women in
online communities?
• RQ3: What is the level of engagement by women in
online communities?
… (trying to) avoid moralistic messages
Q
&
A
Empirical approach
• Data mining/Name extraction
• Gender resolution
• Detection of activity on
– StackOverflow
– Drupal
– WordPress

• Statistical comparison between gender
Data and name extraction
• StackOverflow public data dump
– 1,078,708 registered users
– Too much noise to automatically assign gender
– Random sampling
• 2% margin error
• 99% confidence interval
• Subset of 4,144 SO users
• Manual gender resolution
Data and name extraction II
• Drupal and WordPress
mailing lists
– Both separate Q&A into
various sub-lists
• Consulting
• Development
• Support
• …

– Name, Surname, email
address, text of email,
<<in_response_to>> tag
– All messages & authors
analysed
– Manual gender resolution
What is resolution
Gender your gender?
What is resolution II
Gender your gender?

?
What is resolution III
Gender your gender?
What is resolution IV
Gender your gender?

Name +
Location =
Gender
Lonzo ⇒ Alonzo

w35l3y ⇒ wesley

Name +
Location =
Gender
14/11/13
P
A
S
G
E
E
T
24
W
&

Heuristics:
title + first h1
<title>Ben Kamens</title>
…
<h1>We&#8217;re willing
to be embarrassed about
what we
<em>haven&#8217;t</em>
done&#8230;</h1>

Ben Kamens We’re willing to
be embarrassed about what we
haven’t done…
Stanford Named
Entity Tagger
<PERSON>Ben
Kamens</PERSON> We’re
willing to be embarrassed
about what we haven’t done…
Automatic gender resolution
• Python tool developed

Name,
Country
Gender {masculine,
feminine, x}
14/11/13
P
A
S
Quality of gender resolution: Survey
G
E
E
T
26
W
SelfAs inferred Total
&
identification

M

M
F

F ?

60
2

3 43
5 4

+ avatars,
other social
media sites
(manually)

106
11
SelfAs inferred Total
identification M F ?
M
F

90
2

3 13
9 0

106
11
Hypothesis testing

• Three-way testing {masculine, feminine, x}
• Mann-Whitney test (skewness of data)
14/11/13
P
A
S
G
E
E
T
28
W
&

2,296

291

1,557

3,043

282

286

2,879

328

135

sample
14/11/13
P
A
S
G
E
E
T
29
W
&

2,296

291

1,557

3,043

282

286

2,879

328

135

sample

7-10% women as opposed to
1-5% for Open Source and
up to 28% for proprietary
14/11/13
P
A
S
G
E
E
T
30
W
&

2,296

291

1,557

3,043

282

286

2,879

328

135

sample

7-10% on different mailing lists
more on “use technology”
less on “design technology”
14/11/13
P
A
S
G
E
E
T
31
W
&

2,296

291

1557

3,043

282

286

2,879

328

135

sample

It is easy to remain anonymous on SO and
participants use this opportunity (37.5%)
14/11/13
P
A
S
G
E
E
T
32
W
&

sample

No significant
differences in
#questions, #answers,
length of engagement

Affects eng’t
for “design
tech.” lists
14/11/13
P
A
S
G
E
E
T
33
W
&

sample

Engage
Ask more
for longer
questions
No diff in #answers

Women can
contribute to SO
but choose not to!
14/11/13
P
A
S
G
E
E
T
34
W • [Gneezy,
&

Why?

Niederle, Rustichini 2003]: women are less
effective in mixed-gender competitive environments

• [Niederle, Vesterlund 2007]: women shy away from
competition and men embrace it
• To retain women we need different gamification
techniques
14/11/13
P
A
S
Threats to validity
G
E
E
T
35
• Gender inference:
W
&
• Automated: Imprecise

tooling
• Manual: Errare humanum est

• Gender swapping
• Images of other people as avatars
• Celebrities, children, porn stars…
14/11/13
P
A
S
G
E
E
T
36
•
W
&

Future work…
Roles: coders, translators, UI designers
– Similar to diff mailing lists in Drupal/WordPress
– Activity (commits) rather than discussion

• Output: code, bugs, …
14/11/13
P
A
S
G
E
E
T
37
W
&

Name +
Location =
Gender
Questions?
Vasilescu, B., Capiluppi, A., Serebrenik A.
(2012): Gender, Representation and Online
Participation: A Quantitative Study of
StackOverflow Social Informatics
(SocialInformatics), 2012 International
Conference on, p. 332-338
●

Vasilescu, B., Capiluppi, A., Serebrenik A.
(2013): Men at work: the StackOverflow case Tiny
Transactions on Computer Science, 2
●

Vasilescu, B., Capiluppi, A., Serebrenik A.
(2013): Gender, Representation and Online
Participation: A Quantitative Study, Interacting
with Computers 2013; doi: 10.1093/iwc/iwt047
●

More Related Content

Similar to Gender, Representation and Online Participation: a Quantitative Study

Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
bodaceacat
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
Sara-Jayne Terp
 
Data visualisations as a gateway to programming
Data visualisations as a gateway to programmingData visualisations as a gateway to programming
Data visualisations as a gateway to programming
Mia
 
What you did last summer?
What you did last summer?What you did last summer?
What you did last summer?
DoThinger
 
From ICT to Computing. Presentation for the inaugral meeting of the Calderdal...
From ICT to Computing. Presentation for the inaugral meeting of the Calderdal...From ICT to Computing. Presentation for the inaugral meeting of the Calderdal...
From ICT to Computing. Presentation for the inaugral meeting of the Calderdal...
Pete Bell
 
Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for development
Sara-Jayne Terp
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
Luke Czarnecki
 
Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021
Fabien Gandon
 
Diversity and Inclusion
Diversity and InclusionDiversity and Inclusion
Diversity and Inclusion
Alexander Serebrenik
 
NL-Graphs: A Hybrid Approach toward Interactively Querying Semantic Data
NL-Graphs: A Hybrid Approach toward Interactively Querying Semantic DataNL-Graphs: A Hybrid Approach toward Interactively Querying Semantic Data
NL-Graphs: A Hybrid Approach toward Interactively Querying Semantic Data
Suvodeep Mazumdar
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
CS, NcState
 
How machines learn to talk. Machine Learning for Conversational AI
How machines learn to talk. Machine Learning for Conversational AIHow machines learn to talk. Machine Learning for Conversational AI
How machines learn to talk. Machine Learning for Conversational AI
Verena Rieser
 
Virtual Assisted Self Interview Research
Virtual Assisted Self Interview ResearchVirtual Assisted Self Interview Research
Virtual Assisted Self Interview Research
Mark Bell
 
Lies, Damned Lies and Software Analytics: Why Big Data Needs Rich Data
Lies, Damned Lies and Software Analytics:  Why Big Data Needs Rich DataLies, Damned Lies and Software Analytics:  Why Big Data Needs Rich Data
Lies, Damned Lies and Software Analytics: Why Big Data Needs Rich Data
Margaret-Anne Storey
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity Report
Fabien Gandon
 
Last Responders Final Presentation
Last Responders Final PresentationLast Responders Final Presentation
Last Responders Final PresentationPratham Parikh
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
Matthew Lease
 
La gestión del conocimiento: la Web 2.0, Redes Sociales, y otras herramientas
La gestión del conocimiento: la Web 2.0, Redes Sociales, y otras herramientasLa gestión del conocimiento: la Web 2.0, Redes Sociales, y otras herramientas
La gestión del conocimiento: la Web 2.0, Redes Sociales, y otras herramientas
Radar Información y Conocimiento
 
The Elusive Nature of Software Documentation
The Elusive Nature of Software DocumentationThe Elusive Nature of Software Documentation
The Elusive Nature of Software Documentation
Margaret-Anne Storey
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
YONG ZHENG
 

Similar to Gender, Representation and Online Participation: a Quantitative Study (20)

Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Data visualisations as a gateway to programming
Data visualisations as a gateway to programmingData visualisations as a gateway to programming
Data visualisations as a gateway to programming
 
What you did last summer?
What you did last summer?What you did last summer?
What you did last summer?
 
From ICT to Computing. Presentation for the inaugral meeting of the Calderdal...
From ICT to Computing. Presentation for the inaugral meeting of the Calderdal...From ICT to Computing. Presentation for the inaugral meeting of the Calderdal...
From ICT to Computing. Presentation for the inaugral meeting of the Calderdal...
 
Data visualization for development
Data visualization for developmentData visualization for development
Data visualization for development
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
Wimmics Overview 2021
Wimmics Overview 2021Wimmics Overview 2021
Wimmics Overview 2021
 
Diversity and Inclusion
Diversity and InclusionDiversity and Inclusion
Diversity and Inclusion
 
NL-Graphs: A Hybrid Approach toward Interactively Querying Semantic Data
NL-Graphs: A Hybrid Approach toward Interactively Querying Semantic DataNL-Graphs: A Hybrid Approach toward Interactively Querying Semantic Data
NL-Graphs: A Hybrid Approach toward Interactively Querying Semantic Data
 
The Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software DataThe Art and Science of Analyzing Software Data
The Art and Science of Analyzing Software Data
 
How machines learn to talk. Machine Learning for Conversational AI
How machines learn to talk. Machine Learning for Conversational AIHow machines learn to talk. Machine Learning for Conversational AI
How machines learn to talk. Machine Learning for Conversational AI
 
Virtual Assisted Self Interview Research
Virtual Assisted Self Interview ResearchVirtual Assisted Self Interview Research
Virtual Assisted Self Interview Research
 
Lies, Damned Lies and Software Analytics: Why Big Data Needs Rich Data
Lies, Damned Lies and Software Analytics:  Why Big Data Needs Rich DataLies, Damned Lies and Software Analytics:  Why Big Data Needs Rich Data
Lies, Damned Lies and Software Analytics: Why Big Data Needs Rich Data
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity Report
 
Last Responders Final Presentation
Last Responders Final PresentationLast Responders Final Presentation
Last Responders Final Presentation
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
La gestión del conocimiento: la Web 2.0, Redes Sociales, y otras herramientas
La gestión del conocimiento: la Web 2.0, Redes Sociales, y otras herramientasLa gestión del conocimiento: la Web 2.0, Redes Sociales, y otras herramientas
La gestión del conocimiento: la Web 2.0, Redes Sociales, y otras herramientas
 
The Elusive Nature of Software Documentation
The Elusive Nature of Software DocumentationThe Elusive Nature of Software Documentation
The Elusive Nature of Software Documentation
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
 

Recently uploaded

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 

Recently uploaded (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

Gender, Representation and Online Participation: a Quantitative Study

  • 1. Gender, Representation and Online Participation: a Quantitative Study Dr Andrea Capiluppi 30 Oct 2013 Dept of Information Systems and Computing (DISC)
  • 2. My research background • Software engineering – – – – Software maintenance & evolution Software architectures, components & reuse Effort estimation Quantitative studies • Open processes – Open source products – Social networks • Wikipedia • Q&A sites
  • 3. The Fastest Q&A Site in the West • StackOverflow is a “Question & Answer site for programmers” – Part of the StackExchange network • Most questions are answered – StackOverflow (92.6%) – Yahoo! Answers (88.2%) – KiN (~66%) • Median answer time of only 11 minutes! Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., & Hartmann, B. (2011, May). Design lessons from the fastest q&a site in the west. In Proceedings of the SIGCHI conference on Human factors in computing systems (pp. 2857-2866). ACM.
  • 4. Game Mechanisms in SO • SO is based on points – Reputation points • Good answer • Good comment • Good question • ... – Badges • Popular Question • Commentator • Necromancer • … – Privileges: more points give access to more features • Voting • Commenting • Editing
  • 5. How this work started • Major conference, paper painting the awesomeness of StackOverflow Lotufo, R., Passos, L., & Czarnecki, K. (2012, June). Towards improving bug tracking systems with game mechanisms. In Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on (pp. 2-11). IEEE.
  • 6. How this work started • Paper was well received • Questions from the audience: – is SO attracting a male-only crowd? • Wider questions: – Are prizes, badges, reputation creating an unbalanced participation? – Is “gaming” lethal for a social network? Making it less sustainable?
  • 8. A bit of a touchy topic... Regarding the FLOSS community as a whole, have you ever observed discriminatory behaviour against women? FLOSSPOLS Deliverable D16 Gender: Integrated Report of Findings. http://www.flosspols.o rg/deliverables/D16H TML/FLOSSPOLSD16Gender_Integrated_R eport_of_Findings.ht m, 2006.
  • 9. Demoted skills • Online status and reputation: 'pro' and 'rookie' – Technical skills: coding, debugging, etc. – Non-technical skills: usability, web design, etc. • (…) the skill of web design was demoted to a ‘nontechnical’ status as it became a way in which women described and approached their work [Kotamraju 2003] Kotamraju, N. 2003. Art versus Codep: The Gendered Evolution of Web Design df Skills. In Howard, P. and S. Jones (eds) Society Online: The Internet in Context. London: Sage.
  • 11. Aim of the study • Provide quantifiable evidence of gender participation and engagement – Is gender ratio unbalanced? – Is gender engagement unbalanced? • Data sampling: Q&A sites – StackOverflow – Wordpress – Drupal
  • 12. 1) What is your gender? 2) What do you do on a Q&A site? / SET / W&I 14/11/13 PAGE 12
  • 13. Research questions: • RQ1: What are the challenges with identifying gender in online communities? • RQ2: What is the rate of participation by women in online communities? • RQ3: What is the level of engagement by women in online communities? … (trying to) avoid moralistic messages
  • 14. Q & A
  • 15.
  • 16. Empirical approach • Data mining/Name extraction • Gender resolution • Detection of activity on – StackOverflow – Drupal – WordPress • Statistical comparison between gender
  • 17. Data and name extraction • StackOverflow public data dump – 1,078,708 registered users – Too much noise to automatically assign gender – Random sampling • 2% margin error • 99% confidence interval • Subset of 4,144 SO users • Manual gender resolution
  • 18. Data and name extraction II • Drupal and WordPress mailing lists – Both separate Q&A into various sub-lists • Consulting • Development • Support • … – Name, Surname, email address, text of email, <<in_response_to>> tag – All messages & authors analysed – Manual gender resolution
  • 20. What is resolution II Gender your gender? ?
  • 21. What is resolution III Gender your gender?
  • 22. What is resolution IV Gender your gender? Name + Location = Gender
  • 23. Lonzo ⇒ Alonzo w35l3y ⇒ wesley Name + Location = Gender
  • 24. 14/11/13 P A S G E E T 24 W & Heuristics: title + first h1 <title>Ben Kamens</title> … <h1>We&#8217;re willing to be embarrassed about what we <em>haven&#8217;t</em> done&#8230;</h1> Ben Kamens We’re willing to be embarrassed about what we haven’t done… Stanford Named Entity Tagger <PERSON>Ben Kamens</PERSON> We’re willing to be embarrassed about what we haven’t done…
  • 25. Automatic gender resolution • Python tool developed Name, Country Gender {masculine, feminine, x}
  • 26. 14/11/13 P A S Quality of gender resolution: Survey G E E T 26 W SelfAs inferred Total & identification M M F F ? 60 2 3 43 5 4 + avatars, other social media sites (manually) 106 11 SelfAs inferred Total identification M F ? M F 90 2 3 13 9 0 106 11
  • 27. Hypothesis testing • Three-way testing {masculine, feminine, x} • Mann-Whitney test (skewness of data)
  • 29. 14/11/13 P A S G E E T 29 W & 2,296 291 1,557 3,043 282 286 2,879 328 135 sample 7-10% women as opposed to 1-5% for Open Source and up to 28% for proprietary
  • 30. 14/11/13 P A S G E E T 30 W & 2,296 291 1,557 3,043 282 286 2,879 328 135 sample 7-10% on different mailing lists more on “use technology” less on “design technology”
  • 31. 14/11/13 P A S G E E T 31 W & 2,296 291 1557 3,043 282 286 2,879 328 135 sample It is easy to remain anonymous on SO and participants use this opportunity (37.5%)
  • 32. 14/11/13 P A S G E E T 32 W & sample No significant differences in #questions, #answers, length of engagement Affects eng’t for “design tech.” lists
  • 33. 14/11/13 P A S G E E T 33 W & sample Engage Ask more for longer questions No diff in #answers Women can contribute to SO but choose not to!
  • 34. 14/11/13 P A S G E E T 34 W • [Gneezy, & Why? Niederle, Rustichini 2003]: women are less effective in mixed-gender competitive environments • [Niederle, Vesterlund 2007]: women shy away from competition and men embrace it • To retain women we need different gamification techniques
  • 35. 14/11/13 P A S Threats to validity G E E T 35 • Gender inference: W & • Automated: Imprecise tooling • Manual: Errare humanum est • Gender swapping • Images of other people as avatars • Celebrities, children, porn stars…
  • 36. 14/11/13 P A S G E E T 36 • W & Future work… Roles: coders, translators, UI designers – Similar to diff mailing lists in Drupal/WordPress – Activity (commits) rather than discussion • Output: code, bugs, …
  • 38. Questions? Vasilescu, B., Capiluppi, A., Serebrenik A. (2012): Gender, Representation and Online Participation: A Quantitative Study of StackOverflow Social Informatics (SocialInformatics), 2012 International Conference on, p. 332-338 ● Vasilescu, B., Capiluppi, A., Serebrenik A. (2013): Men at work: the StackOverflow case Tiny Transactions on Computer Science, 2 ● Vasilescu, B., Capiluppi, A., Serebrenik A. (2013): Gender, Representation and Online Participation: A Quantitative Study, Interacting with Computers 2013; doi: 10.1093/iwc/iwt047 ●

Editor's Notes

  1. Advantages: controlled sample Disadvantages: representative? In any case: direction for future work &lt;number&gt;
  2. However, what is common to both Drupal and WordPress is that the dierences in gender participation occur mostly between mailing lists focussing on designing technology (development, wp-hackers and wp-xmlrc) and using technology (consulting, wp-docs and wp-edu). &lt;number&gt;