SlideShare a Scribd company logo
1 of 22
Download to read offline
Mining Python
Software
Sarah Mount - @snim2
What do you want to know today?
What do we know about software?
● How to make it correct
● How long it will take to write
● Expected bugs per kloc
Er … yeah.
Health warning...
This is a work in progress,
don’t take the numbers and
charts too seriously just yet...
Options for mining Python software
<?xml version="1.0" encoding="UTF-8"?>
<response>
<status>success</status>
<result>
<project>
<id>1</id>
<name>Subversion</name>
<created_at>2006-10-10T15:51:31Z</created_at>
<updated_at>2007-08-22T17:31:17Z</updated_at>
<homepage_url>http://subversion.tigris.org/</homepage_url>
<download_url>http://subversion.tigris.org/...
</download_url>
<updated_at>2007-07-12T12:21:11Z</updated_at>
<logged_at>2007-07-12T12:18:54Z</logged_at>
<min_month>2001-08-01T00:00:00Z</min_month>
<max_month>2007-07-01T00:00:00Z</max_month>
...
{
"repository":{
"url":"https://github.com/igrigorik/spdy",
"has_downloads":false,
"created_at":"2012/01/19 14:15:34 -0800",
"has_issues":true,
"description":"SPDY is an experiment with protocols for the web",
"forks":10,
"fork":false,
"has_wiki":false,
"homepage":"http://www.igvita.com/2011/04/07/life-beyond-http-11-googles-spdy/",
"size":420,
"private":false,
"name":"spdy",
"owner":"igrigorik",
"open_issues":4,
"watchers":206,
"pushed_at":"2012/01/11 10:38:16 -0700",
"language":"Ruby"
},
"created_at":"2012/02/11 10:38:16 -0700",
"public":true,
"actor":"igrigorik",
"payload":{
"head":"98f44cab69becb274c6f3b9035ef8e0bd7b2b1b7",
"size":1,
...
],
"ref":"refs/heads/master"
},
"url":"https://github.com/igrigorik/spdy/compare/5b74597e88...98f44cab69b",
"type":"PushEvent"
}
Google bigquery interface
/* top 100 repos for Ruby by number of pushes */
SELECT repository_name, count(repository_name) as pushes, repository_description,
repository_url
FROM [githubarchive:github.timeline]
WHERE type="PushEvent"
AND repository_language="Ruby"
AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC('2012-04-01 00:00:00')
GROUP BY repository_name, repository_description, repository_url
ORDER BY pushes DESC
LIMIT 100
Some preliminary work
Code clones
Type 1: Identical code, copy & pasted
Type 2: Identical code modulo names, layout,
comments, etc.
Type 3: Type 2 plus further modifications such
as changes in statements
Type 4: Different code, same semantics
Roy & Cordy (2007)
Sentiment (in comments)
Some ideas for mining projects
Mining ideas
● How do programming idioms develop and
spread?
● How do projects reach a critical mass of
developers and become “popular”?
● Are metrics like cyclomatic complexity, fan
out and Halstead’s complexity measure
useful, or are they all just proportional to
kLOCs?
Thank you.

More Related Content

Similar to Mining python-software-pyconuk13

Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
Lessons Learned from Migrating Legacy Enterprise Applications to MicroservicesLessons Learned from Migrating Legacy Enterprise Applications to Microservices
Lessons Learned from Migrating Legacy Enterprise Applications to MicroservicesVMware Tanzu
 
Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010Abhishek Mishra
 
Exadata cell update
Exadata cell updateExadata cell update
Exadata cell updatepat2001
 
Changing Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentChanging Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentMatt Graham
 
Web TCard - Speed optimization
Web TCard - Speed optimizationWeb TCard - Speed optimization
Web TCard - Speed optimizationEric Guo
 
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.BizAdvanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.BizSuthep Sangvirotjanaphat
 
Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023Scott Keck-Warren
 
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docxCisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docxclarebernice
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesAltinity Ltd
 
Is your Magento fast enough?
Is your Magento fast enough?Is your Magento fast enough?
Is your Magento fast enough?Giannis Economou
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekrantav
 
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчикаМы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчикаNikita Prokopov
 
Continuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeployContinuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeployPeter Gfader
 
Continuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptnContinuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptnAndreas Grabner
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
 
Oracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdfOracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdfAlex446314
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsMike Brittain
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayCosimo Streppone
 
Introduction to SaltStack (An Event-Based Configuration Management)
Introduction to SaltStack (An Event-Based Configuration Management)Introduction to SaltStack (An Event-Based Configuration Management)
Introduction to SaltStack (An Event-Based Configuration Management)DevOps Indonesia
 

Similar to Mining python-software-pyconuk13 (20)

Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
Lessons Learned from Migrating Legacy Enterprise Applications to MicroservicesLessons Learned from Migrating Legacy Enterprise Applications to Microservices
Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
 
Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010Scraping with Python for Fun and Profit - PyCon India 2010
Scraping with Python for Fun and Profit - PyCon India 2010
 
Exadata cell update
Exadata cell updateExadata cell update
Exadata cell update
 
Changing Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous DeploymentChanging Etsy's Architectural Foundations with Continuous Deployment
Changing Etsy's Architectural Foundations with Continuous Deployment
 
Web TCard - Speed optimization
Web TCard - Speed optimizationWeb TCard - Speed optimization
Web TCard - Speed optimization
 
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.BizAdvanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
Advanced ClickOnce Deployment Techniques by Suthep S - GreatFriends.Biz
 
Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023Static Code Analysis PHP[tek] 2023
Static Code Analysis PHP[tek] 2023
 
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docxCisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
Cisco Network Proposal Part 1by Jesse HolmesSubmission d.docx
 
Shell_Rec
Shell_RecShell_Rec
Shell_Rec
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
Is your Magento fast enough?
Is your Magento fast enough?Is your Magento fast enough?
Is your Magento fast enough?
 
Continues Deployment - Tech Talk week
Continues Deployment - Tech Talk weekContinues Deployment - Tech Talk week
Continues Deployment - Tech Talk week
 
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчикаМы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
Мы ведь тоже люди. Еретическая лекция про юзабилити инструментов разработчика
 
Continuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeployContinuous Delivery with TFS msbuild msdeploy
Continuous Delivery with TFS msbuild msdeploy
 
Continuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptnContinuous Delivery and Automated Operations on k8s with keptn
Continuous Delivery and Automated Operations on k8s with keptn
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Oracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdfOracle_Patching_Untold_Story_Final_Part2.pdf
Oracle_Patching_Untold_Story_Final_Part2.pdf
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
 
Introduction to SaltStack (An Event-Based Configuration Management)
Introduction to SaltStack (An Event-Based Configuration Management)Introduction to SaltStack (An Event-Based Configuration Management)
Introduction to SaltStack (An Event-Based Configuration Management)
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Mining python-software-pyconuk13