Mining Python
Software
Sarah Mount - @snim2
What do you want to know today?
What do we know about software?
● How to make it correct
● How long it will take to write
...
Health warning...
This is a work in progress,
don’t take the numbers and
charts too seriously just yet...
Options for mining Python software
<?xml version="1.0" encoding="UTF-8"?>
<response>
<status>success</status>
<result>
<project>
<id>1</id>
<name>Subversion<...
{
"repository":{
"url":"https://github.com/igrigorik/spdy",
"has_downloads":false,
"created_at":"2012/01/19 14:15:34 -0800...
Google bigquery interface
/* top 100 repos for Ruby by number of pushes */
SELECT repository_name, count(repository_name) ...
Some preliminary work
Code clones
Type 1: Identical code, copy & pasted
Type 2: Identical code modulo names, layout,
comments, etc.
Type 3: Type...
Sentiment (in comments)
Some ideas for mining projects
Mining ideas
● How do programming idioms develop and
spread?
● How do projects reach a critical mass of
developers and bec...
Thank you.
Mining python-software-pyconuk13
Mining python-software-pyconuk13
Mining python-software-pyconuk13
Mining python-software-pyconuk13
Mining python-software-pyconuk13
Mining python-software-pyconuk13
Mining python-software-pyconuk13
Mining python-software-pyconuk13
Mining python-software-pyconuk13
Upcoming SlideShare
Loading in...5
×

Mining python-software-pyconuk13

2,233

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,233
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Mining python-software-pyconuk13"

  1. 1. Mining Python Software Sarah Mount - @snim2
  2. 2. What do you want to know today? What do we know about software? ● How to make it correct ● How long it will take to write ● Expected bugs per kloc Er … yeah.
  3. 3. Health warning... This is a work in progress, don’t take the numbers and charts too seriously just yet...
  4. 4. Options for mining Python software
  5. 5. <?xml version="1.0" encoding="UTF-8"?> <response> <status>success</status> <result> <project> <id>1</id> <name>Subversion</name> <created_at>2006-10-10T15:51:31Z</created_at> <updated_at>2007-08-22T17:31:17Z</updated_at> <homepage_url>http://subversion.tigris.org/</homepage_url> <download_url>http://subversion.tigris.org/... </download_url> <updated_at>2007-07-12T12:21:11Z</updated_at> <logged_at>2007-07-12T12:18:54Z</logged_at> <min_month>2001-08-01T00:00:00Z</min_month> <max_month>2007-07-01T00:00:00Z</max_month> ...
  6. 6. { "repository":{ "url":"https://github.com/igrigorik/spdy", "has_downloads":false, "created_at":"2012/01/19 14:15:34 -0800", "has_issues":true, "description":"SPDY is an experiment with protocols for the web", "forks":10, "fork":false, "has_wiki":false, "homepage":"http://www.igvita.com/2011/04/07/life-beyond-http-11-googles-spdy/", "size":420, "private":false, "name":"spdy", "owner":"igrigorik", "open_issues":4, "watchers":206, "pushed_at":"2012/01/11 10:38:16 -0700", "language":"Ruby" }, "created_at":"2012/02/11 10:38:16 -0700", "public":true, "actor":"igrigorik", "payload":{ "head":"98f44cab69becb274c6f3b9035ef8e0bd7b2b1b7", "size":1, ... ], "ref":"refs/heads/master" }, "url":"https://github.com/igrigorik/spdy/compare/5b74597e88...98f44cab69b", "type":"PushEvent" }
  7. 7. Google bigquery interface /* top 100 repos for Ruby by number of pushes */ SELECT repository_name, count(repository_name) as pushes, repository_description, repository_url FROM [githubarchive:github.timeline] WHERE type="PushEvent" AND repository_language="Ruby" AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC('2012-04-01 00:00:00') GROUP BY repository_name, repository_description, repository_url ORDER BY pushes DESC LIMIT 100
  8. 8. Some preliminary work
  9. 9. Code clones Type 1: Identical code, copy & pasted Type 2: Identical code modulo names, layout, comments, etc. Type 3: Type 2 plus further modifications such as changes in statements Type 4: Different code, same semantics Roy & Cordy (2007)
  10. 10. Sentiment (in comments)
  11. 11. Some ideas for mining projects
  12. 12. Mining ideas ● How do programming idioms develop and spread? ● How do projects reach a critical mass of developers and become “popular”? ● Are metrics like cyclomatic complexity, fan out and Halstead’s complexity measure useful, or are they all just proportional to kLOCs?
  13. 13. Thank you.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×