Successfully reported this slideshow.

Extracting Data from your Open Source Communities

0

Share

Loading in …3
×
1 of 9
1 of 9

Extracting Data from your Open Source Communities

0

Share

Download to read offline

Open source communities are filled with huge amounts of data just waiting to be analyzed. Getting this data into a format that can be easily used for analysis may seem intimidating at first, but there are some very useful open source tools that make this task relatively easy. The primary tools used in this talk are the open source Metrics Grimoire tools that take data from various community sources and store it in a database where it can be easily queried and analyzed.

This talk will cover
* CVSAnalY to gather and analyze source code repository data
* MLStats to gather and analyze mailing list data
* Other Metrics Grimoire tools for bug trackers, IRC, Wikis and more
* Gource to visualize source code repository data.

The goal is for people to walk away with some basic techniques and tools that they can use to begin using the data from open source communities.

Open source communities are filled with huge amounts of data just waiting to be analyzed. Getting this data into a format that can be easily used for analysis may seem intimidating at first, but there are some very useful open source tools that make this task relatively easy. The primary tools used in this talk are the open source Metrics Grimoire tools that take data from various community sources and store it in a database where it can be easily queried and analyzed.

This talk will cover
* CVSAnalY to gather and analyze source code repository data
* MLStats to gather and analyze mailing list data
* Other Metrics Grimoire tools for bug trackers, IRC, Wikis and more
* Gource to visualize source code repository data.

The goal is for people to walk away with some basic techniques and tools that they can use to begin using the data from open source communities.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Extracting Data from your Open Source Communities

  1. 1. EXTRACTING DATA FROM YOUR OPEN SOURCE COMMUNITIES Dawn M. Foster @geekygirldawn   dawn@fastwonder.com   fastwonderblog.com PhD  Student,  University  of  Greenwich   Consultant,  The  Scale  Factory
  2. 2. WHOAMI • Geek, traveler, reader • 20 year tech career. Past 15 years doing community & open source (Intel, Puppet Labs, etc.) • PhD student at University of Greenwich researching Linux kernel • Community and open source consultant at The Scale Factory Photos by Josh Bancroft, Don Park
  3. 3. I 💖 METRICS GRIMOIRE MailingListStats aka MLStats CVSAnalY - repos Bicho - bugs More Photo by Bitergia http://metricsgrimoire.github.io/
  4. 4. MLSTATS AND CVSANALY a) Install $ python setup.py install b) Create database mysql> create database mlstats;
 mysql> create database cvsanaly; c) Import data $ mlstats http://URLOFYOURLIST
 $ cvsanaly2 /path/to/repo
  5. 5. MLSTATS: EXTRACT DATA Top 100 messages (most replied to threads): SELECT subject, COUNT(*) as total 
 FROM messages 
 GROUP BY subject 
 ORDER by total DESC 
 LIMIT 100; Other queries:
 # of messages from a specific person
 # of messages per person from email domain
 Find all messages with specific word in subject line (patch)
  6. 6. CVSANALY: EXTRACT DATA Number of commits per person by email domain: SELECT p.name, p.email, 
 COUNT(distinct(s.id)) as num_commits 
 FROM people p, scmlog s 
 WHERE email like "%company.com" 
 AND p.id=s.author_id 
 GROUP BY email
 ORDER BY num_commits DESC; Other queries:
 Top commit authors all time
 # of commits for specific person
  7. 7. OTHER GRIMOIRE OPTIONS Bug data Wikis IRC Aggregate across tools Photo by Bitergia
  8. 8. GOURCE Visualize repository data using Gource
 http://gource.io/
  9. 9. Dawn Foster PhD student, University of Greenwich Consultant, The Scale Factory @geekygirldawn, dawn@dawnfoster.com fastwonderblog.com THANK YOU

×