Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration

114 views

Published on

Continuous Integration (CI) has become a best practice of modern software development. Thanks in part to its tight integration with GitHub, Travis CI has emerged as arguably the most widely used CI platform for Open-Source Software (OSS) development. However, despite its prominent role in Software Engineering in practice, the benefits, costs, and implications of doing CI are all but clear from an academic standpoint. Little research has been done, and even less was of quantitative nature. In order to lay the groundwork for data-driven research on CI, we built TravisTorrent, (travistorrent.testroots.org), a freely available data set based on Travis CI and GitHub that provides easy access to hundreds of thousands of analyzed builds from more than 1,000 projects. Unique to TravisTorrent is that each of its 2,640,825 Travis builds is synthesized with meta data from Travis CI’s API, the results of analyzing its textual build log, a link to the GitHub commit which triggered the build, and dynamically aggregated project data from the time of commit extracted through GHTorrent.

Published in: Software
  • Be the first to comment

  • Be the first to like this

TravisTorrent: Synthesizing Travis CI and GitHub for Full-Stack Research on Continuous Integration

  1. 1. MSR 2017 Mining Challenge Moritz Beller @Inventitech Georgios Gousios, Andy Zaidman
  2. 2. MSR 2017 Mining Challenge Moritz Beller @Inventitech Georgios Gousios, Andy Zaidman
  3. 3. TODO: Add background with Sun How Came to Be
  4. 4. TODO: Add background with Sun How Came to Be
  5. 5. TODO: Add background with Sun How Came to Be Tomorrow, Sunday, 11:00-12.30 Cine. Ground floor
  6. 6. TODO: Add background with Sun How Came to Be Tomorrow, Sunday, 11:00-12.30 Cine. Ground floor
  7. 7. TODO: Add background with Sun Travis 101
  8. 8. TODO: Add background with Sun Travis 101
  9. 9. TODO: Add background with Sun Travis 101
  10. 10. TODO: Add background with Sun Travis 101 Project: 1,200
  11. 11. TODO: Add background with Sun Travis 101 Builds: 680,209 Project: 1,200
  12. 12. TODO: Add background with Sun Travis 101 Builds: 680,209 Project: 1,200 Build result
  13. 13. TODO: Add background with Sun Travis 101 Builds: 680,209 Project: 1,200 Duration Build result
  14. 14. TODO: Add background with Sun Travis 101 Builds: 680,209 Project: 1,200 Duration Languages: Java, Ruby Build result
  15. 15. TODO: Add background with Sun Travis 101 Builds: 680,209 Project: 1,200 Jobs: 2+ Million Duration Languages: Java, Ruby Build result
  16. 16. TODO: Add background with Sun Travis 101 Builds: 680,209 Project: 1,200 Jobs: 2+ Million Duration Languages: Java, Ruby Build result
  17. 17. TODO: Add background with Sun Travis 101 Build(job) log (up to 50M!): 1 Terabyte (Sorry for downloading it thrice, Travis CI!) Builds: 680,209 Project: 1,200 Jobs: 2+ Million Duration Languages: Java, Ruby Build result
  18. 18. TODO: Add background with Sun Travis 101 Build(job) log (up to 50M!): 1 Terabyte (Sorry for downloading it thrice, Travis CI!) Builds: 680,209 Project: 1,200 Jobs: 2+ Million Duration Link to Git(Hub) Languages: Java, Ruby Build result
  19. 19. TODO: Add background with Sun Facts About TravisTorrent
  20. 20. TODO: Add background with Sun Facts About TravisTorrent - git_branch - git_all_built_commits - ...
  21. 21. TODO: Add background with Sun Facts About TravisTorrent - git_branch - git_all_built_commits - ... - gh_project_name - gh_is_pr - gh_team_size - gh_diff_tests_added - gh_first_commit_...
  22. 22. TODO: Add background with Sun Facts About TravisTorrent - git_branch - git_all_built_commits - ... - gh_project_name - gh_is_pr - gh_team_size - gh_diff_tests_added - gh_first_commit_... - tr_prev_build - tr_status - tr_log_tests_ran - tr_log_tests_failed
  23. 23. TODO: Add background with Sun Facts About TravisTorrent - git_branch - git_all_built_commits - ... - gh_project_name - gh_is_pr - gh_team_size - gh_diff_tests_added - gh_first_commit_... - tr_prev_build - tr_status - tr_log_tests_ran - tr_log_tests_failed
  24. 24. TODO: Add background with Sun Facts About TravisTorrent - git_branch - git_all_built_commits - ... - gh_project_name - gh_is_pr - gh_team_size - gh_diff_tests_added - gh_first_commit_... - tr_prev_build - tr_status - tr_log_tests_ran - tr_log_tests_failed new! new! 62columns
  25. 25. TODO: Add background with Sun Community-Driven Development (CDD)
  26. 26. TODO: Add background with Sun Community-Driven Development (CDD) ● C. M. Rosenberg: “gh_first_commit_created_at and gh_build_started_at are always 22:00 or 23:00 PM” → Bug in R’s anytime library (“anytime aims to be that general purpose converter no matter the input” - "You are reading too much into this. We heuristically try a bunch formats. None includes TZ because we can't. No more, no less.” - Well, parsedate can.) ● Kevin (Unknown): “Does 'tr_tests_ran' indicate whether tests were run during the build? If yes, why is it that in 39698 jobs, 'tr_tests_run' and 'tr_tests_skipped' have a value of 0 but 'tr_tests_ran' has a value of 1?” → Inconsistency: tr_tests_run should be NA instead 0 ● Sebastian Baltes: First BigQuery Integration Script ● Questions that improved our documentation: Omar Elazhary, Matthias Bussonnier, Ward Muylaert, and many others in disqus!
  27. 27. TODO: Add background with Sun Community-Driven Development (CDD) ● C. M. Rosenberg: “gh_first_commit_created_at and gh_build_started_at are always 22:00 or 23:00 PM” → Bug in R’s anytime library (“anytime aims to be that general purpose converter no matter the input” - "You are reading too much into this. We heuristically try a bunch formats. None includes TZ because we can't. No more, no less.” - Well, parsedate can.) ● Kevin (Unknown): “Does 'tr_tests_ran' indicate whether tests were run during the build? If yes, why is it that in 39698 jobs, 'tr_tests_run' and 'tr_tests_skipped' have a value of 0 but 'tr_tests_ran' has a value of 1?” → Inconsistency: tr_tests_run should be NA instead 0 ● Sebastian Baltes: First BigQuery Integration Script ● Questions that improved our documentation: Omar Elazhary, Matthias Bussonnier, Ward Muylaert, and many others in disqus!
  28. 28. TODO: Add background with Sun Community-Driven Development (CDD) ● C. M. Rosenberg: “gh_first_commit_created_at and gh_build_started_at are always 22:00 or 23:00 PM” → Bug in R’s anytime library (“anytime aims to be that general purpose converter no matter the input” - "You are reading too much into this. We heuristically try a bunch formats. None includes TZ because we can't. No more, no less.” - Well, parsedate can.) ● Kevin (Unknown): “Does 'tr_tests_ran' indicate whether tests were run during the build? If yes, why is it that in 39698 jobs, 'tr_tests_run' and 'tr_tests_skipped' have a value of 0 but 'tr_tests_ran' has a value of 1?” → Inconsistency: tr_tests_run should be NA instead 0 ● Sebastian Baltes: First BigQuery Integration Script ● Questions that improved our documentation: Omar Elazhary, Matthias Bussonnier, Ward Muylaert, and many others in disqus!
  29. 29. TODO: Add background with Sun Community-Driven Development (CDD) ● C. M. Rosenberg: “gh_first_commit_created_at and gh_build_started_at are always 22:00 or 23:00 PM” → Bug in R’s anytime library (“anytime aims to be that general purpose converter no matter the input” - "You are reading too much into this. We heuristically try a bunch formats. None includes TZ because we can't. No more, no less.” - Well, parsedate can.) ● Kevin (Unknown): “Does 'tr_tests_ran' indicate whether tests were run during the build? If yes, why is it that in 39698 jobs, 'tr_tests_run' and 'tr_tests_skipped' have a value of 0 but 'tr_tests_ran' has a value of 1?” → Inconsistency: tr_tests_run should be NA instead 0 ● Sebastian Baltes: First BigQuery Integration Script ● Questions that improved our documentation: Omar Elazhary, Matthias Bussonnier, Ward Muylaert, and many others in disqus!
  30. 30. TODO: Add background with Sun Community-Driven Development (CDD) ● C. M. Rosenberg: “gh_first_commit_created_at and gh_build_started_at are always 22:00 or 23:00 PM” → Bug in R’s anytime library (“anytime aims to be that general purpose converter no matter the input” - "You are reading too much into this. We heuristically try a bunch formats. None includes TZ because we can't. No more, no less.” - Well, parsedate can.) ● Kevin (Unknown): “Does 'tr_tests_ran' indicate whether tests were run during the build? If yes, why is it that in 39698 jobs, 'tr_tests_run' and 'tr_tests_skipped' have a value of 0 but 'tr_tests_ran' has a value of 1?” → Inconsistency: tr_tests_run should be NA instead 0 ● Sebastian Baltes: First BigQuery Integration Script ● Questions that improved our documentation: Omar Elazhary, Matthias Bussonnier, Ward Muylaert, and many others in disqus!
  31. 31. TODO: Add background with Sun Community-Driven Development (CDD) ● C. M. Rosenberg: “gh_first_commit_created_at and gh_build_started_at are always 22:00 or 23:00 PM” → Bug in R’s anytime library (“anytime aims to be that general purpose converter no matter the input” - "You are reading too much into this. We heuristically try a bunch formats. None includes TZ because we can't. No more, no less.” - Well, parsedate can.) ● Kevin (Unknown): “Does 'tr_tests_ran' indicate whether tests were run during the build? If yes, why is it that in 39698 jobs, 'tr_tests_run' and 'tr_tests_skipped' have a value of 0 but 'tr_tests_ran' has a value of 1?” → Inconsistency: tr_tests_run should be NA instead 0 ● Sebastian Baltes: First BigQuery Integration Script ● Questions that improved our documentation: Omar Elazhary, Matthias Bussonnier, Ward Muylaert, and many others in disqus!
  32. 32. TODO: Add background with Sun Community-Driven Development (CDD) ● C. M. Rosenberg: “gh_first_commit_created_at and gh_build_started_at are always 22:00 or 23:00 PM” → Bug in R’s anytime library (“anytime aims to be that general purpose converter no matter the input” - "You are reading too much into this. We heuristically try a bunch formats. None includes TZ because we can't. No more, no less.” - Well, parsedate can.) ● Kevin (Unknown): “Does 'tr_tests_ran' indicate whether tests were run during the build? If yes, why is it that in 39698 jobs, 'tr_tests_run' and 'tr_tests_skipped' have a value of 0 but 'tr_tests_ran' has a value of 1?” → Inconsistency: tr_tests_run should be NA instead 0 ● Sebastian Baltes: First BigQuery Integration Script ● Questions that improved our documentation: Omar Elazhary, Matthias Bussonnier, Ward Muylaert, and many others in disqus! Thank you!
  33. 33. TODO: Add background with Sun MSR Challenge Organization
  34. 34. TODO: Add background with Sun MSR Challenge Organization Thank you!
  35. 35. TODO: Add background with Sun MSR Challenge Organization Submissions ● Record number of 31 submissions ● 1 Desk-rejected (7 pages!), 1 pulled by authors ● 29 submissions into review Reviews ● Each paper 3 reviews ● Double-blind worked mostly well ● Except if you have to double your PC (9 20)→ ● Had to re-assign one paper (my bad) Outcome ● 14 accepted, 15 rejected (acceptance rate: 48%) ● On purpose: Every paper with a valid contribution should be discussed Thank you!
  36. 36. TODO: Add background with Sun MSR Challenge Organization Submissions ● Record number of 31 submissions ● 1 Desk-rejected (7 pages!), 1 pulled by authors ● 29 submissions into review Reviews ● Each paper 3 reviews ● Double-blind worked mostly well ● Except if you have to double your PC (9 20)→ ● Had to re-assign one paper (my bad) Outcome ● 14 accepted, 15 rejected (acceptance rate: 48%) ● On purpose: Every paper with a valid contribution should be discussed Thank you!
  37. 37. TODO: Add background with Sun MSR Challenge Organization Submissions ● Record number of 31 submissions ● 1 Desk-rejected (7 pages!), 1 pulled by authors ● 29 submissions into review Reviews ● Each paper 3 reviews ● Double-blind worked mostly well ● Except if you have to double your PC (9 20)→ ● Had to re-assign one paper (my bad) Outcome ● 14 accepted, 15 rejected (acceptance rate: 48%) ● On purpose: Every paper with a valid contribution should be discussed Thank you!
  38. 38. TODO: Add background with Sun MSR Challenge Organization Submissions ● Record number of 31 submissions ● 1 Desk-rejected (7 pages!), 1 pulled by authors ● 29 submissions into review Reviews ● Each paper 3 reviews ● Double-blind worked mostly well ● Except if you have to double your PC (9 20)→ ● Had to re-assign one paper (my bad) Outcome ● 14 accepted, 15 rejected (acceptance rate: 48%) ● On purpose: Every paper with a valid contribution should be discussed Thank you!
  39. 39. TODO: Add background with Sun MSR Challenge Organization Submissions ● Record number of 31 submissions ● 1 Desk-rejected (7 pages!), 1 pulled by authors ● 29 submissions into review Reviews ● Each paper 3 reviews ● Double-blind worked mostly well ● Except if you have to double your PC (9 20)→ ● Had to re-assign one paper (my bad) Outcome ● 14 accepted, 15 rejected (acceptance rate: 48%) ● On purpose: Every paper with a valid contribution should be discussed Thank you!
  40. 40. TODO: Add background with Sun MSR Challenge Organization Submissions ● Record number of 31 submissions ● 1 Desk-rejected (7 pages!), 1 pulled by authors ● 29 submissions into review Reviews ● Each paper 3 reviews ● Double-blind worked mostly well ● Except if you have to double your PC (9 20)→ ● Had to re-assign one paper (my bad) Outcome ● 14 accepted, 15 rejected (acceptance rate: 48%) ● On purpose: Every paper with a valid contribution should be discussed Thank you!
  41. 41. TODO: Add background with Sun MSR Challenge Organization Submissions ● Record number of 31 submissions ● 1 Desk-rejected (7 pages!), 1 pulled by authors ● 29 submissions into review Reviews ● Each paper 3 reviews ● Double-blind worked mostly well ● Except if you have to double your PC (9 20)→ ● Had to re-assign one paper (my bad) Outcome ● 14 accepted, 15 rejected (acceptance rate: 48%) ● On purpose: Every paper with a valid contribution should be discussed Thank you!
  42. 42. TODO: Add background with Sun MSR Challenge Organization Submissions ● Record number of 31 submissions ● 1 Desk-rejected (7 pages!), 1 pulled by authors ● 29 submissions into review Reviews ● Each paper 3 reviews ● Double-blind worked mostly well ● Except if you have to double your PC (9 20)→ ● Had to re-assign one paper (my bad) Outcome ● 14 accepted, 15 rejected (acceptance rate: 48%) ● On purpose: Every paper with a valid contribution should be discussed Thank you!
  43. 43. TODO: Add background with Sun MSR Challenge Organization Submissions ● Record number of 31 submissions ● 1 Desk-rejected (7 pages!), 1 pulled by authors ● 29 submissions into review Reviews ● Each paper 3 reviews ● Double-blind worked mostly well ● Except if you have to double your PC (9 20)→ ● Had to re-assign one paper (my bad) Outcome ● 14 accepted, 15 rejected (acceptance rate: 48%) ● On purpose: Every paper with a valid contribution should be discussed Thank you!
  44. 44. TODO: Add background with Sun MSR Challenge Organization Submissions ● Record number of 31 submissions ● 1 Desk-rejected (7 pages!), 1 pulled by authors ● 29 submissions into review Reviews ● Each paper 3 reviews ● Double-blind worked mostly well ● Except if you have to double your PC (9 20)→ ● Had to re-assign one paper (my bad) Outcome ● 14 accepted, 15 rejected (acceptance rate: 48%) ● On purpose: Every paper with a valid contribution should be discussed Thank you!
  45. 45. TODO: Add background with Sun MSR Challenge Organization Submissions ● Record number of 31 submissions ● 1 Desk-rejected (7 pages!), 1 pulled by authors ● 29 submissions into review Reviews ● Each paper 3 reviews ● Double-blind worked mostly well ● Except if you have to double your PC (9 20)→ ● Had to re-assign one paper (my bad) Outcome ● 14 accepted, 15 rejected (acceptance rate: 48%) ● On purpose: Every paper with a valid contribution should be discussed Thank you!
  46. 46. TODO: Add background with Sun MSR Challenge Organization Submissions ● Record number of 31 submissions ● 1 Desk-rejected (7 pages!), 1 pulled by authors ● 29 submissions into review Reviews ● Each paper 3 reviews ● Double-blind worked mostly well ● Except if you have to double your PC (9 20)→ ● Had to re-assign one paper (my bad) Outcome ● 14 accepted, 15 rejected (acceptance rate: 48%) ● On purpose: Every paper with a valid contribution should be discussed Thank you!
  47. 47. TODO: Add background with Sun Lessons Learned (Future Chairs)
  48. 48. TODO: Add background with Sun Lessons Learned (Future Chairs) ● Oh, these !”§$%&/()= updates! ● Serious (time) commitment on updating, bug fixing, support ● Provide one data format (should have been CSV) ● Make your data perfect the first time ;) ● If you can, host someone else’s data ● Disqus panel proved helpful: 98 comments in total ● Make sure your PC is large enough
  49. 49. TODO: Add background with Sun Lessons Learned (Future Chairs) ● Oh, these !”§$%&/()= updates! ● Serious (time) commitment on updating, bug fixing, support ● Provide one data format (should have been CSV) ● Make your data perfect the first time ;) ● If you can, host someone else’s data ● Disqus panel proved helpful: 98 comments in total ● Make sure your PC is large enough
  50. 50. TODO: Add background with Sun Lessons Learned (Future Chairs) ● Oh, these !”§$%&/()= updates! ● Serious (time) commitment on updating, bug fixing, support ● Provide one data format (should have been CSV) ● Make your data perfect the first time ;) ● If you can, host someone else’s data ● Disqus panel proved helpful: 98 comments in total ● Make sure your PC is large enough
  51. 51. TODO: Add background with Sun Lessons Learned (Future Chairs) ● Oh, these !”§$%&/()= updates! ● Serious (time) commitment on updating, bug fixing, support ● Provide one data format (should have been CSV) ● Make your data perfect the first time ;) ● If you can, host someone else’s data ● Disqus panel proved helpful: 98 comments in total ● Make sure your PC is large enough
  52. 52. TODO: Add background with Sun Lessons Learned (Future Chairs) ● Oh, these !”§$%&/()= updates! ● Serious (time) commitment on updating, bug fixing, support ● Provide one data format (should have been CSV) ● Make your data perfect the first time ;) ● If you can, host someone else’s data ● Disqus panel proved helpful: 98 comments in total ● Make sure your PC is large enough
  53. 53. TODO: Add background with Sun Lessons Learned (Future Chairs) ● Oh, these !”§$%&/()= updates! ● Serious (time) commitment on updating, bug fixing, support ● Provide one data format (should have been CSV) ● Make your data perfect the first time ;) ● If you can, host someone else’s data ● Disqus panel proved helpful: 98 comments in total ● Make sure your PC is large enough
  54. 54. TODO: Add background with Sun Lessons Learned (Future Chairs) ● Oh, these !”§$%&/()= updates! ● Serious (time) commitment on updating, bug fixing, support ● Provide one data format (should have been CSV) ● Make your data perfect the first time ;) ● If you can, host someone else’s data ● Disqus panel proved helpful: 98 comments in total ● Make sure your PC is large enough
  55. 55. TODO: Add background with Sun Paper Arrivals “click” => Early detection of #Submissions
  56. 56. TODO: Add background with Sun Topics of Submissions keyword #accepts #rejects freq build result 5 4 31% machine learning 3 4 24% prediction 2 2 14% factor analysis 3 1 14% time 3 1 14% sentiment 1 2 10% external vs. internal contribs 2 1 10% project success 2 7% testing 1 1 7% attraction 1 3% team size 1 3% code review 1 3% meta-paper 1 3%
  57. 57. TODO: Add background with Sun Topics of Submissions keyword #accepts #rejects freq build result 5 4 31% machine learning 3 4 24% prediction 2 2 14% factor analysis 3 1 14% time 3 1 14% sentiment 1 2 10% external vs. internal contribs 2 1 10% project success 2 7% testing 1 1 7% attraction 1 3% team size 1 3% code review 1 3% meta-paper 1 3%
  58. 58. TODO: Add background with Sun Topics of Submissions keyword #accepts #rejects freq build result 5 4 31% machine learning 3 4 24% prediction 2 2 14% factor analysis 3 1 14% time 3 1 14% sentiment 1 2 10% external vs. internal contribs 2 1 10% project success 2 7% testing 1 1 7% attraction 1 3% team size 1 3% code review 1 3% meta-paper 1 3%
  59. 59. TODO: Add background with Sun Topics of Submissions keyword #accepts #rejects freq build result 5 4 31% machine learning 3 4 24% prediction 2 2 14% factor analysis 3 1 14% time 3 1 14% sentiment 1 2 10% external vs. internal contribs 2 1 10% project success 2 7% testing 1 1 7% attraction 1 3% team size 1 3% code review 1 3% meta-paper 1 3%
  60. 60. TODO: Add background with Sun Topics of Submissions keyword #accepts #rejects freq build result 5 4 31% machine learning 3 4 24% prediction 2 2 14% factor analysis 3 1 14% time 3 1 14% sentiment 1 2 10% external vs. internal contribs 2 1 10% project success 2 7% testing 1 1 7% attraction 1 3% team size 1 3% code review 1 3% meta-paper 1 3%
  61. 61. TODO: Add background with Sun Awards
  62. 62. TODO: Add background with Sun Awards
  63. 63. TODO: Add background with Sun Awards Best Paper ● Suggested by reviewers, chosen by chairs with help from Anna Nagy (Travis CI) ● Prize: 200$ voucher from Travis CI (& Fame) ● Winner ...
  64. 64. TODO: Add background with Sun Awards Best Paper ● Suggested by reviewers, chosen by chairs with help from Anna Nagy (Travis CI) ● Prize: 200$ voucher from Travis CI (& Fame) ● Winner ...
  65. 65. TODO: Add background with Sun Awards Best Paper ● Suggested by reviewers, chosen by chairs with help from Anna Nagy (Travis CI) ● Prize: 200$ voucher from Travis CI (& Fame) ● Winner ...
  66. 66. TODO: Add background with Sun Awards Best Paper ● Suggested by reviewers, chosen by chairs with help from Anna Nagy (Travis CI) ● Prize: 200$ voucher from Travis CI (& Fame) ● Winner ...
  67. 67. TODO: Add background with Sun Awards Best Paper ● Suggested by reviewers, chosen by chairs with help from Anna Nagy (Travis CI) ● Prize: 200$ voucher from Travis CI (& Fame) ● Winner ... Best Presentation ● Elected by you, plurality vote! ● Prize: Surprise (& Fame)
  68. 68. TODO: Add background with Sun Awards Best Paper ● Suggested by reviewers, chosen by chairs with help from Anna Nagy (Travis CI) ● Prize: 200$ voucher from Travis CI (& Fame) ● Winner ... Best Presentation ● Elected by you, plurality vote! ● Prize: Surprise (& Fame)
  69. 69. TODO: Add background with Sun Awards Best Paper ● Suggested by reviewers, chosen by chairs with help from Anna Nagy (Travis CI) ● Prize: 200$ voucher from Travis CI (& Fame) ● Winner ... Best Presentation ● Elected by you, plurality vote! ● Prize: Surprise (& Fame)
  70. 70. TODO: Add background with Sun Awards Best Paper ● Suggested by reviewers, chosen by chairs with help from Anna Nagy (Travis CI) ● Prize: 200$ voucher from Travis CI (& Fame) ● Winner ... Best Presentation ● Elected by you, plurality vote! ● Prize: Surprise (& Fame)
  71. 71. TODO: Add background with Sun TravisTorrent: In the future ...
  72. 72. TODO: Add background with Sun TravisTorrent: In the future ... ● Total infrastructure redesign ● Real time statistics: BLAaaS ● Opt-in model for new projects ● Self-updating and synced to Google BigQuery ● Official Google show case data set (hopefully) ● Addition of new languages ● More intelligent log-analysis (for example, with neural networks instead of RegExen)

×