Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Science Challenges in Personal Program Analysis

10,447 views

Published on

Delivered by Bas Van Schaik at the 2016 New York R Conference on April 8th and 9th at Work-Bench.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Data Science Challenges in Personal Program Analysis

  1. 1. Data Science Challenges in Personal Program Analysis Bas van Schaik New York R Conference (April 2016)
  2. 2. - Cloud service for personal program analysis - Free for OSS projects - Currently in private beta, release imminent
  3. 3. Personal Program Analysis: why? We are passionate about code. We wish everyone would write better code. We help people build better software better.
  4. 4. Ehm… Program analysis? Compiler
  5. 5. What’s an ‘Alert’? Short answer: a bug or a violation of good coding practice Example: define the same key twice in a Python dict E.g. in OpenStack Designate: self.target = objects.PoolTarget.from_dict({ 'type': 'powerdns', 'options': [{ 'key': 'connection', 'value': 'memory://', 'key': 'host', 'value': '127.0.0.1', 'key': 'port', 'value': 53}], }) My guess of what was intended: self.target = objects.PoolTarget.from_dict({ 'type': 'powerdns', 'options': [ {'key': 'connection', 'value': 'memory://'}, {'key': 'host', 'value': '127.0.0.1'}, {'key': 'port', 'value': 53}], })
  6. 6. What’s an ‘Alert’? Alerts are found by queries: ● The source code is our database ● Every query result is an alert. Support for 10 different programming languages (and counting), a total > 1000 queries and metrics.
  7. 7. What does a query look like? from Method m where m.hasName("hashcode") and m.hasNoParameters() select m, "Should this method be called 'hashCode' rather than 'hashcode'?"
  8. 8. Making it interesting: project over time netalerts activity compositionnetLOC OpenStack Nova (python)
  9. 9. Or: compare different projects Cinder Nova Neutron Horizon Heat Swift Sahara Glance Designate Keystone Fuel Ironic alerts LOC
  10. 10. Even more interesting: make it personal A X net LOC contributed (all OpenStack modules) netalerts B
  11. 11. Data Science for PPA: finding fun facts Trailblazer Bug squasher Refactorer None Major release Totalcontributors%contributors Who's doing what in OpenStack?
  12. 12. Data science for PPA: cleaning PostgreSQL (net churn and net alerts - before cleaning) PostgreSQL: after cleaning
  13. 13. Warning: DEMO of beta software
  14. 14. But… why make it personal? Some developers not so happy: “are you questioning my ability to write code?” No. We're helping you to improve.
  15. 15. But… why make it personal? By making it personal, we make people care. When people care, they improve. When developers improve, the code improves.
  16. 16. But… why make it personal? When developers improve, the code improves. ● Automated code review on GitHub pull requests ● “On 12/11/2015 you introduced X, fancy fixing that?” ● “You recently fixed alert A in file B. Based on your expertise, you might also be interested in fixing alert X in file Y?” ● “Compared to developers like you, you rank 20 out of 100” ● “… and by fixing these 5 alerts, you'll be in the top 10!” ● Found a bug in your project? Write a query for it, share it!
  17. 17. Not rocket science… Or is it?
  18. 18. DEMO (continued)
  19. 19. Interested in… Early access to CodingStars? Having your OSS project analysed? Working for us in New York, San Francisco, Oxford (UK), or Copenhagen (Denmark)? Talk to us! (in person, or bas@semmle.com)

×