Extracting Data from your Open Source Communities

Dawn Foster
Dawn FosterDirector of Open Source Community Strategy
EXTRACTING DATA FROM
YOUR OPEN SOURCE
COMMUNITIES
Dawn M. Foster
@geekygirldawn	
  
dawn@fastwonder.com	
  
fastwonderblog.com
PhD	
  Student,	
  University	
  of	
  Greenwich	
  
Consultant,	
  The	
  Scale	
  Factory
WHOAMI
• Geek, traveler, reader
• 20 year tech career. Past 15
years doing community & open
source (Intel, Puppet Labs, etc.)
• PhD student at University of
Greenwich researching Linux
kernel
• Community and open source
consultant at The Scale Factory
Photos by Josh Bancroft, Don Park
I 💖 METRICS GRIMOIRE
MailingListStats aka MLStats
CVSAnalY - repos
Bicho - bugs
More
Photo by Bitergia
http://metricsgrimoire.github.io/
MLSTATS AND CVSANALY
a) Install
$ python setup.py install
b) Create database
mysql> create database mlstats;

mysql> create database cvsanaly;
c) Import data
$ mlstats http://URLOFYOURLIST

$ cvsanaly2 /path/to/repo
MLSTATS: EXTRACT DATA
Top 100 messages (most replied to threads):
SELECT subject, COUNT(*) as total 

FROM messages 

GROUP BY subject 

ORDER by total DESC 

LIMIT 100;
Other queries:

# of messages from a specific person

# of messages per person from email domain

Find all messages with specific word in subject line (patch)
CVSANALY: EXTRACT DATA
Number of commits per person by email domain:
SELECT p.name, p.email, 

COUNT(distinct(s.id)) as num_commits 

FROM people p, scmlog s 

WHERE email like "%company.com" 

AND p.id=s.author_id 

GROUP BY email

ORDER BY num_commits DESC;
Other queries:

Top commit authors all time

# of commits for specific person
OTHER GRIMOIRE OPTIONS
Bug data
Wikis
IRC
Aggregate across tools
Photo by Bitergia
GOURCE
Visualize repository data using Gource

http://gource.io/
Dawn Foster
PhD student, University of Greenwich
Consultant, The Scale Factory
@geekygirldawn, dawn@dawnfoster.com
fastwonderblog.com
THANK YOU
1 of 9

Recommended

Using Gource to visualize Linux kernel data by
Using Gource to visualize Linux kernel dataUsing Gource to visualize Linux kernel data
Using Gource to visualize Linux kernel dataDawn Foster
898 views23 slides
Visualize Your Code Repos and More with Gource by
Visualize Your Code Repos and More with GourceVisualize Your Code Repos and More with Gource
Visualize Your Code Repos and More with GourceDawn Foster
1.3K views24 slides
CHAOSS Metrics Overview and Examples by
CHAOSS Metrics Overview and ExamplesCHAOSS Metrics Overview and Examples
CHAOSS Metrics Overview and ExamplesDawn Foster
8 views22 slides
Be a Good Corporate Citizen in Kubernetes by
Be a Good Corporate Citizen in KubernetesBe a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in KubernetesDawn Foster
8 views17 slides
Overcoming Imposter Syndrome to Become a Conference Speaker! by
Overcoming Imposter Syndrome to Become a Conference Speaker!Overcoming Imposter Syndrome to Become a Conference Speaker!
Overcoming Imposter Syndrome to Become a Conference Speaker!Dawn Foster
12 views35 slides
How to Be a Good Corporate Citizen in Open Source by
How to Be a Good Corporate Citizen in Open SourceHow to Be a Good Corporate Citizen in Open Source
How to Be a Good Corporate Citizen in Open SourceDawn Foster
15 views25 slides

More Related Content

More from Dawn Foster

Measuring Project Health at VMware by
Measuring Project Health at VMwareMeasuring Project Health at VMware
Measuring Project Health at VMwareDawn Foster
118 views11 slides
Navigating Open Source Risk by
Navigating Open Source RiskNavigating Open Source Risk
Navigating Open Source RiskDawn Foster
112 views18 slides
Collaborative Leadership: Governance Beyond Company Affiliation by
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
204 views18 slides
Collaborative Leadership: Governance Beyond Company Affiliation by
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
228 views18 slides
Collaborative Leadership: Governance Beyond Company Affiliation by
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
148 views18 slides
Collaborative Leadership: Governance Beyond Company Affiliation by
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company AffiliationDawn Foster
180 views17 slides

More from Dawn Foster(20)

Measuring Project Health at VMware by Dawn Foster
Measuring Project Health at VMwareMeasuring Project Health at VMware
Measuring Project Health at VMware
Dawn Foster118 views
Navigating Open Source Risk by Dawn Foster
Navigating Open Source RiskNavigating Open Source Risk
Navigating Open Source Risk
Dawn Foster112 views
Collaborative Leadership: Governance Beyond Company Affiliation by Dawn Foster
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
Dawn Foster204 views
Collaborative Leadership: Governance Beyond Company Affiliation by Dawn Foster
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
Dawn Foster228 views
Collaborative Leadership: Governance Beyond Company Affiliation by Dawn Foster
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
Dawn Foster148 views
Collaborative Leadership: Governance Beyond Company Affiliation by Dawn Foster
Collaborative Leadership: Governance Beyond Company AffiliationCollaborative Leadership: Governance Beyond Company Affiliation
Collaborative Leadership: Governance Beyond Company Affiliation
Dawn Foster180 views
Is this Open Source Project Healthy or Lifeless? by Dawn Foster
Is this Open Source Project Healthy or Lifeless?Is this Open Source Project Healthy or Lifeless?
Is this Open Source Project Healthy or Lifeless?
Dawn Foster198 views
Collaboration in Linux Kernel Mailing Lists by Dawn Foster
Collaboration in Linux Kernel Mailing Lists Collaboration in Linux Kernel Mailing Lists
Collaboration in Linux Kernel Mailing Lists
Dawn Foster178 views
Be a Good Corporate Citizen in Kubernetes by Dawn Foster
Be a Good Corporate Citizen in KubernetesBe a Good Corporate Citizen in Kubernetes
Be a Good Corporate Citizen in Kubernetes
Dawn Foster215 views
Being a Good Corporate Citizen in Open Source by Dawn Foster
Being a Good Corporate Citizen in Open SourceBeing a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open Source
Dawn Foster196 views
Building Community for your Company’s OSS Projects by Dawn Foster
Building Community for your Company’s OSS ProjectsBuilding Community for your Company’s OSS Projects
Building Community for your Company’s OSS Projects
Dawn Foster167 views
Building Community for your Company’s OSS Project by Dawn Foster
Building Community for your Company’s OSS ProjectBuilding Community for your Company’s OSS Project
Building Community for your Company’s OSS Project
Dawn Foster221 views
How to be a terrible hiring manager by Dawn Foster
How to be a terrible hiring managerHow to be a terrible hiring manager
How to be a terrible hiring manager
Dawn Foster509 views
A week in the Life of Kubernetes by Dawn Foster
A week in the Life of KubernetesA week in the Life of Kubernetes
A week in the Life of Kubernetes
Dawn Foster310 views
Open Source Collaboration and Companies: Finding the Right Balance by Dawn Foster
Open Source Collaboration and Companies: Finding the Right BalanceOpen Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right Balance
Dawn Foster224 views
Strategies to Balance the Needs of the Company and the Community by Dawn Foster
Strategies to Balance the Needs  of the Company and the CommunityStrategies to Balance the Needs  of the Company and the Community
Strategies to Balance the Needs of the Company and the Community
Dawn Foster287 views
Being a Good Corporate Citizen in Open Source by Dawn Foster
Being a Good Corporate Citizen in Open SourceBeing a Good Corporate Citizen in Open Source
Being a Good Corporate Citizen in Open Source
Dawn Foster241 views
Open Source Collaboration and Companies: Finding the Right Balance by Dawn Foster
Open Source Collaboration and Companies: Finding the Right BalanceOpen Source Collaboration and Companies: Finding the Right Balance
Open Source Collaboration and Companies: Finding the Right Balance
Dawn Foster232 views
Building a Community Metrics Strategy FOSDEM 2019 by Dawn Foster
Building a Community Metrics Strategy FOSDEM 2019Building a Community Metrics Strategy FOSDEM 2019
Building a Community Metrics Strategy FOSDEM 2019
Dawn Foster384 views
Open Source Collaboration: Finding the right balance by Dawn Foster
Open Source Collaboration: Finding the right balanceOpen Source Collaboration: Finding the right balance
Open Source Collaboration: Finding the right balance
Dawn Foster446 views

Recently uploaded

Empathic Computing: Delivering the Potential of the Metaverse by
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the MetaverseMark Billinghurst
470 views80 slides
AMAZON PRODUCT RESEARCH.pdf by
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdfJerikkLaureta
15 views13 slides
Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 views1 slide
Info Session November 2023.pdf by
Info Session November 2023.pdfInfo Session November 2023.pdf
Info Session November 2023.pdfAleksandraKoprivica4
10 views15 slides
From chaos to control: Managing migrations and Microsoft 365 with ShareGate! by
From chaos to control: Managing migrations and Microsoft 365 with ShareGate!From chaos to control: Managing migrations and Microsoft 365 with ShareGate!
From chaos to control: Managing migrations and Microsoft 365 with ShareGate!sammart93
9 views39 slides
Perth MeetUp November 2023 by
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023 Michael Price
15 views44 slides

Recently uploaded(20)

Empathic Computing: Delivering the Potential of the Metaverse by Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst470 views
AMAZON PRODUCT RESEARCH.pdf by JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta15 views
From chaos to control: Managing migrations and Microsoft 365 with ShareGate! by sammart93
From chaos to control: Managing migrations and Microsoft 365 with ShareGate!From chaos to control: Managing migrations and Microsoft 365 with ShareGate!
From chaos to control: Managing migrations and Microsoft 365 with ShareGate!
sammart939 views
Perth MeetUp November 2023 by Michael Price
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023
Michael Price15 views
Attacking IoT Devices from a Web Perspective - Linux Day by Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri15 views
DALI Basics Course 2023 by Ivory Egg
DALI Basics Course  2023DALI Basics Course  2023
DALI Basics Course 2023
Ivory Egg14 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec11 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma17 views
Data-centric AI and the convergence of data and model engineering: opportunit... by Paolo Missier
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
Paolo Missier34 views

Extracting Data from your Open Source Communities

  • 1. EXTRACTING DATA FROM YOUR OPEN SOURCE COMMUNITIES Dawn M. Foster @geekygirldawn   dawn@fastwonder.com   fastwonderblog.com PhD  Student,  University  of  Greenwich   Consultant,  The  Scale  Factory
  • 2. WHOAMI • Geek, traveler, reader • 20 year tech career. Past 15 years doing community & open source (Intel, Puppet Labs, etc.) • PhD student at University of Greenwich researching Linux kernel • Community and open source consultant at The Scale Factory Photos by Josh Bancroft, Don Park
  • 3. I 💖 METRICS GRIMOIRE MailingListStats aka MLStats CVSAnalY - repos Bicho - bugs More Photo by Bitergia http://metricsgrimoire.github.io/
  • 4. MLSTATS AND CVSANALY a) Install $ python setup.py install b) Create database mysql> create database mlstats;
 mysql> create database cvsanaly; c) Import data $ mlstats http://URLOFYOURLIST
 $ cvsanaly2 /path/to/repo
  • 5. MLSTATS: EXTRACT DATA Top 100 messages (most replied to threads): SELECT subject, COUNT(*) as total 
 FROM messages 
 GROUP BY subject 
 ORDER by total DESC 
 LIMIT 100; Other queries:
 # of messages from a specific person
 # of messages per person from email domain
 Find all messages with specific word in subject line (patch)
  • 6. CVSANALY: EXTRACT DATA Number of commits per person by email domain: SELECT p.name, p.email, 
 COUNT(distinct(s.id)) as num_commits 
 FROM people p, scmlog s 
 WHERE email like "%company.com" 
 AND p.id=s.author_id 
 GROUP BY email
 ORDER BY num_commits DESC; Other queries:
 Top commit authors all time
 # of commits for specific person
  • 7. OTHER GRIMOIRE OPTIONS Bug data Wikis IRC Aggregate across tools Photo by Bitergia
  • 8. GOURCE Visualize repository data using Gource
 http://gource.io/
  • 9. Dawn Foster PhD student, University of Greenwich Consultant, The Scale Factory @geekygirldawn, dawn@dawnfoster.com fastwonderblog.com THANK YOU