Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
EXTRACTING DATA FROM
YOUR OPEN SOURCE
COMMUNITIES
Dawn M. Foster
@geekygirldawn	
  
dawn@fastwonder.com	
  
fastwonderblog...
WHOAMI
• Geek, traveler, reader
• 20 year tech career. Past 15
years doing community & open
source (Intel, Puppet Labs, et...
I 💖 METRICS GRIMOIRE
MailingListStats aka MLStats
CVSAnalY - repos
Bicho - bugs
More
Photo by Bitergia
http://metricsgrimo...
MLSTATS AND CVSANALY
a) Install
$ python setup.py install
b) Create database
mysql> create database mlstats;

mysql> creat...
MLSTATS: EXTRACT DATA
Top 100 messages (most replied to threads):
SELECT subject, COUNT(*) as total 

FROM messages 

GROU...
CVSANALY: EXTRACT DATA
Number of commits per person by email domain:
SELECT p.name, p.email, 

COUNT(distinct(s.id)) as nu...
OTHER GRIMOIRE OPTIONS
Bug data
Wikis
IRC
Aggregate across tools
Photo by Bitergia
GOURCE
Visualize repository data using Gource

http://gource.io/
Dawn Foster
PhD student, University of Greenwich
Consultant, The Scale Factory
@geekygirldawn, dawn@dawnfoster.com
fastwon...
Upcoming SlideShare
Loading in …5
×

Extracting Data from your Open Source Communities

908 views

Published on

Open source communities are filled with huge amounts of data just waiting to be analyzed. Getting this data into a format that can be easily used for analysis may seem intimidating at first, but there are some very useful open source tools that make this task relatively easy. The primary tools used in this talk are the open source Metrics Grimoire tools that take data from various community sources and store it in a database where it can be easily queried and analyzed.

This talk will cover
* CVSAnalY to gather and analyze source code repository data
* MLStats to gather and analyze mailing list data
* Other Metrics Grimoire tools for bug trackers, IRC, Wikis and more
* Gource to visualize source code repository data.

The goal is for people to walk away with some basic techniques and tools that they can use to begin using the data from open source communities.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Extracting Data from your Open Source Communities

  1. 1. EXTRACTING DATA FROM YOUR OPEN SOURCE COMMUNITIES Dawn M. Foster @geekygirldawn   dawn@fastwonder.com   fastwonderblog.com PhD  Student,  University  of  Greenwich   Consultant,  The  Scale  Factory
  2. 2. WHOAMI • Geek, traveler, reader • 20 year tech career. Past 15 years doing community & open source (Intel, Puppet Labs, etc.) • PhD student at University of Greenwich researching Linux kernel • Community and open source consultant at The Scale Factory Photos by Josh Bancroft, Don Park
  3. 3. I 💖 METRICS GRIMOIRE MailingListStats aka MLStats CVSAnalY - repos Bicho - bugs More Photo by Bitergia http://metricsgrimoire.github.io/
  4. 4. MLSTATS AND CVSANALY a) Install $ python setup.py install b) Create database mysql> create database mlstats;
 mysql> create database cvsanaly; c) Import data $ mlstats http://URLOFYOURLIST
 $ cvsanaly2 /path/to/repo
  5. 5. MLSTATS: EXTRACT DATA Top 100 messages (most replied to threads): SELECT subject, COUNT(*) as total 
 FROM messages 
 GROUP BY subject 
 ORDER by total DESC 
 LIMIT 100; Other queries:
 # of messages from a specific person
 # of messages per person from email domain
 Find all messages with specific word in subject line (patch)
  6. 6. CVSANALY: EXTRACT DATA Number of commits per person by email domain: SELECT p.name, p.email, 
 COUNT(distinct(s.id)) as num_commits 
 FROM people p, scmlog s 
 WHERE email like "%company.com" 
 AND p.id=s.author_id 
 GROUP BY email
 ORDER BY num_commits DESC; Other queries:
 Top commit authors all time
 # of commits for specific person
  7. 7. OTHER GRIMOIRE OPTIONS Bug data Wikis IRC Aggregate across tools Photo by Bitergia
  8. 8. GOURCE Visualize repository data using Gource
 http://gource.io/
  9. 9. Dawn Foster PhD student, University of Greenwich Consultant, The Scale Factory @geekygirldawn, dawn@dawnfoster.com fastwonderblog.com THANK YOU

×