Mining python-software-pyconuk13

4,873 views
5,098 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
4,873
On SlideShare
0
From Embeds
0
Number of Embeds
2,639
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mining python-software-pyconuk13

  1. 1. Mining Python Software Sarah Mount - @snim2
  2. 2. What do you want to know today? What do we know about software? ● How to make it correct ● How long it will take to write ● Expected bugs per kloc Er … yeah.
  3. 3. Health warning... This is a work in progress, don’t take the numbers and charts too seriously just yet...
  4. 4. Options for mining Python software
  5. 5. <?xml version="1.0" encoding="UTF-8"?> <response> <status>success</status> <result> <project> <id>1</id> <name>Subversion</name> <created_at>2006-10-10T15:51:31Z</created_at> <updated_at>2007-08-22T17:31:17Z</updated_at> <homepage_url>http://subversion.tigris.org/</homepage_url> <download_url>http://subversion.tigris.org/... </download_url> <updated_at>2007-07-12T12:21:11Z</updated_at> <logged_at>2007-07-12T12:18:54Z</logged_at> <min_month>2001-08-01T00:00:00Z</min_month> <max_month>2007-07-01T00:00:00Z</max_month> ...
  6. 6. { "repository":{ "url":"https://github.com/igrigorik/spdy", "has_downloads":false, "created_at":"2012/01/19 14:15:34 -0800", "has_issues":true, "description":"SPDY is an experiment with protocols for the web", "forks":10, "fork":false, "has_wiki":false, "homepage":"http://www.igvita.com/2011/04/07/life-beyond-http-11-googles-spdy/", "size":420, "private":false, "name":"spdy", "owner":"igrigorik", "open_issues":4, "watchers":206, "pushed_at":"2012/01/11 10:38:16 -0700", "language":"Ruby" }, "created_at":"2012/02/11 10:38:16 -0700", "public":true, "actor":"igrigorik", "payload":{ "head":"98f44cab69becb274c6f3b9035ef8e0bd7b2b1b7", "size":1, ... ], "ref":"refs/heads/master" }, "url":"https://github.com/igrigorik/spdy/compare/5b74597e88...98f44cab69b", "type":"PushEvent" }
  7. 7. Google bigquery interface /* top 100 repos for Ruby by number of pushes */ SELECT repository_name, count(repository_name) as pushes, repository_description, repository_url FROM [githubarchive:github.timeline] WHERE type="PushEvent" AND repository_language="Ruby" AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC('2012-04-01 00:00:00') GROUP BY repository_name, repository_description, repository_url ORDER BY pushes DESC LIMIT 100
  8. 8. Some preliminary work
  9. 9. Code clones Type 1: Identical code, copy & pasted Type 2: Identical code modulo names, layout, comments, etc. Type 3: Type 2 plus further modifications such as changes in statements Type 4: Different code, same semantics Roy & Cordy (2007)
  10. 10. Sentiment (in comments)
  11. 11. Some ideas for mining projects
  12. 12. Mining ideas ● How do programming idioms develop and spread? ● How do projects reach a critical mass of developers and become “popular”? ● Are metrics like cyclomatic complexity, fan out and Halstead’s complexity measure useful, or are they all just proportional to kLOCs?
  13. 13. Thank you.

×