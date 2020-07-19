Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Comprehension Challenges at the Level of Software Ecosystems and Global Software Engineering Keynote by Ralf Lämmel, Faceb...
2 Infrastructure Version control, CI, language services, testing automation, ... Data infrastructure Storage engines, quer...
3 • App development • Service development • Internal tool development • Release management • Bug / incident tracking • Lan...
4 • Engineering in different time zones • Geographically distributed teams • Different employee types • Frequent org / tea...
5 • Developer workflow analysis • Ownership management • Code review automation Comprehension challenges in ... It's all a...
6 Comprehension Challenges in Developer Workflow Analysis Let's focus here on work-item prediction. See the corresponding ...
7 Scenarios of work-item prediction I/II The ‘Incident Response’ Scenario: • Work item: Alert for suboptimal performance ...
8 Scenarios of work-item prediction II/II The ‘Aggregate Performance’ Scenario: • Work item: A diﬀ (a system change) • Q...
9 Dark matter in developer workflow analysis Facebook Inc. ent hal- lex iza- ely ely not nci- ain nds Time line of a devel...
10 Probabilistic work-item prediction Facebook Inc. ent hal- lex iza- ely ely not nci- ain nds Time line of a developer Qu...
11 • Tools don't track work items consistently. • Tools aren't fully integrated. • Logging is not designed with workflow a...
12 • Tools ➖ added, obsoleted, removed • Tool functionality ➖ added, removed, revised (new version) • Interface ➖ added or...
13 Context switching in development Figure 2: Number of (selected) tools used per employee on a given day for many of Face...
14 A system for diff prediction See the corresponding industry track paper. System component / notion Explanation Logging ...
15 Related work discussion on developer workflow analysis I/II Wouter Poncin, Alexander Serebrenik, Mark van den Brand: Pr...
16 Related work discussion on developer workflow analysis II/II Diogo R. Ferreira, Daniel Gillblad: Discovering Process Mo...
17 • Developer workflow analysis • Ownership management • Code review automation Comprehension challenges in ...
18 Comprehension Challenges in Ownership Management This relates to our work on Ownesty. See the corresponding industry tr...
19 What's ownership management? Each asset has the most accountable owner at all times. Software data assets: Hive tables...
20 Architecture of an Ownesty-style Ownership Recommendation System Metastore Explainable recommendations Assets Extractio...
21 Basic challenges in ownership management See the corresponding industry track paper. Challenge Details Ownership decay ...
22 • Team level • Split • Merger • Termination • Individual level • Team move • Function change • Hack a month • Types of ...
23 Heterogeneity of Owned Assets • Reviewer recommendation Dependency Awareness • Call graph, variability, package managem...
24 Related work discussion on ownership management Yue Yu, Huaimin Wang, Gang Yin, Tao Wang: Reviewer recommendation for p...
25 • Developer workflow analysis • Ownership management • Code review automation Comprehension challenges in ...
26 Comprehension Challenges in Code Review Automation
27 Code review -- all great? The most frequent reasons for confusion are the missing rationale, discussion of non-function...
28 Background on code review (automation) I/II Mika Mäntylä, Casper Lassenius: What Types of Defects Are Really Discovered...
29 Background on code review (automation) II/II Davide Spadini, Fabio Palomba, Tobias Baum, Stefan Hanenberg, Magiel Brunt...
30 • https://www.phacility.com/phabricator/ • https://www.jetbrains.com/upsource/ • https://aws.amazon.com/codeguru/ • htt...
31 • Signal selection • Code fixes • Comments • Commit summaries • Test plans • Review decisions • Review comments Some au...
32 Signal selection (Code review automation) Mateusz Machalica, Alex Samylkin, Meredith Porth, Satish Chandra: Predictive ...
33 Code fixes (Code review automation) Johannes Bader, Andrew Scott, Michael Pradel, Satish Chandra: Getaﬁx: learning to ﬁ...
34 Comments (Code review automation) Xing Hu, Ge Li, Xin Xia, David Lo, Zhi Jin: Deep code comment generation with hybrid ...
35 Commit summaries (Code review automation) Jingjing Liang, Yaozong Hou, Shurui Zhou, Junjie Chen, Yingfei Xiong, Gang Hu...
36 Review decisions (Code review automation) Shu-Ting Shi, Ming Li, David Lo, Ferdian Thung, Xuan Huo: Automatic Code Revi...
37 Review comments (Code review automation) Jing Kai Siow, Cuiyun Gao, Lingling Fan, Sen Chen, Yang Liu: CORE: Automating ...
38 • How to make nitpicking obsolete? • How to assess the reliability of a review? • What is a good commit summary? • What...
39 Diff • Diff summary • Diff test plan • Commit • CI signal Miscellaneous • Task (bug or feature) • Alert • Incident • Ro...
40 • Developer workflow analysis, • ownership management, and • code review automation? Any comprehension challenges other...
41 Thanks! Let's discuss.
Upcoming SlideShare
Loading in …5
×

Keynote at-icpc-2020

42 views

Published on

Keynote Ralf Lämmel at ICPC 2020 (International Conference on Program Comprehension(

Published in: Science
no profile picture user

  • Be the first to comment

  • Be the first to like this

Keynote at-icpc-2020

  1. 1. Comprehension Challenges at the Level of Software Ecosystems and Global Software Engineering Keynote by Ralf Lämmel, Facebook London Virtual ICPC 2020, July 2020
  2. 2. 2 Infrastructure Version control, CI, language services, testing automation, ... Data infrastructure Storage engines, query engines, pipelines, metastores, ... AI infrastructure ML workflows, feature stores, online/offline prediction, ... Context ➖ Software engineering in infrastructure at Facebook
  3. 3. 3 • App development • Service development • Internal tool development • Release management • Bug / incident tracking • Language foundation (e.g., Hack + ORM + frameworks + ...) • Data warehouse (e.g., Hive, Spark, Dataswarm pipelines) Context ➖ Software ecosystems at Facebook "A software ecosystem is a collection of software projects which are developed and which co- evolve together in the same environment. [...] The environment can be physical, like in the case of a company or a research group that has a geo- spatial identity, but can also be virtual, like the projects that are part of an open-source community." Source: "Reverse Engineering Software Ecosystems". Dissertation. Mircea F. Lungu. University of Lugano. 2009.
  4. 4. 4 • Engineering in different time zones • Geographically distributed teams • Different employee types • Frequent org / team / role changes • Diversity and inclusion Context ➖ Global software engineering at Facebook "Companies need to use their existing resources as effectively as possible, and they also need to employ resources on a global scale from different sites within the company and from partner companies throughout the world. This has resulted in global software engineering (GSE) [...]" Source: "Global software engineering. Challenges and solutions framework". Dissertation. Päivi Parviainen. University of Oulu. 2012. Top topic in (IC)GSE: • Team • Project • Collaboration • Process • Communication • ... Source: Christof Ebert, Marco Kuhrmann, Rafael Prikladnicki: Global Software Engineering: Evolution and Trends. ICGSE 2016: 144-153
  5. 5. 5 • Developer workflow analysis • Ownership management • Code review automation Comprehension challenges in ... It's all about automation -- think heuristics, ML. Focus on "infrastructure" here -- no apps / user data Big Data, anyway! How does program comprehension help to address these areas? What are the involved or remaining challenges?
  6. 6. 6 Comprehension Challenges in Developer Workflow Analysis Let's focus here on work-item prediction. See the corresponding industry track paper.
  7. 7. 7 Scenarios of work-item prediction I/II The ‘Incident Response’ Scenario: • Work item: Alert for suboptimal performance • Question: The workﬂow steps to follow in response • Automation: Record steps in past instances • Challenge: To know when someone is responding
  8. 8. 8 Scenarios of work-item prediction II/II The ‘Aggregate Performance’ Scenario: • Work item: A diﬀ (a system change) • Question: Time spent on diﬀ • Automation: Record all activities on diﬀ • Challenge: To know when someone is working on the diﬀ
  9. 9. 9 Dark matter in developer workflow analysis Facebook Inc. ent hal- lex iza- ely ely not nci- ain nds Time line of a developer Query DB interactivelyCom m ita version locally Read docum entation Publish a diﬀPublish a diﬀ Review a diﬀ The events on the timeline concern dierent ‘dis’ (i.e., system changes all the way from committing a change locally to landing the change in production) as work items. White events are trivially associated with dis. Gray events require dedicated data integration for association. Black events are hard to associate; advanced heuristics and machine learning may be of
  10. 10. 10 Probabilistic work-item prediction Facebook Inc. ent hal- lex iza- ely ely not nci- ain nds Time line of a developer Query DB interactivelyCom m ita version locally Read docum entation Publish a diﬀPublish a diﬀ Review a diﬀ The events on the timeline concern dierent ‘dis’ (i.e., system changes all the way from committing a change locally to landing the change in production) as work items. White events are trivially associated with dis. Gray events require dedicated data integration for association. Black events are hard to associate; advanced heuristics and machine learning may be of 1.0 .8 .3 .5 .1
  11. 11. 11 • Tools don't track work items consistently. • Tools aren't fully integrated. • Logging is not designed with workflow analysis in mind. • Developer workflow is somewhat unstructured. • Developers engage in a lot of context switching. • ... Why do we have dark matter? Also known elsewhere as: Sukriti Goel, Jyoti M. Bhat, and Barbara Weber. 2013. End-to-End Process Extraction in Process Unaware Systems. In Business Process Management Workshops - BPM 2012 International Workshops. Revised Papers (Lecture Notes in Business Information Processing), Vol132. Springer, 162–173.
  12. 12. 12 • Tools ➖ added, obsoleted, removed • Tool functionality ➖ added, removed, revised (new version) • Interface ➖ added or removed form, revised schema or semantics • Integration with other tools or into suites evolves • Logging ➖ schema or semantics evolves • Best practices and use cases evolve For instance: consider aspects of tooling! We need automation (ML heuristics) -- reverse and re- engineering doesn't scale!
  13. 13. 13 Context switching in development Figure 2: Number of (selected) tools used per employee on a given day for many of Facebook’s employees. Figure 3: Concurrent workow by a developer on several dis (y-axis) over a few days (x-axis). We need more than time proximity and high- confidence events.
  14. 14. 14 A system for diff prediction See the corresponding industry track paper. System component / notion Explanation Logging foundation Integrate all available logs: version control, continuous integration, CLI, internal web-based tools, ... Time windows into dark matter Use windows of 10 minutes. Wanted: the probability of the employee working on a diﬀ. Candidate work items Anything the employee may have possibly worked on Probabilistc ranking based on ML / heuristics High conﬁdence, e.g., employee submitted diﬀ revision Low conﬁdence, e.g., employee queried table mentioned in diﬀ
  15. 15. 15 Related work discussion on developer workflow analysis I/II Wouter Poncin, Alexander Serebrenik, Mark van den Brand: Process Mining Software Repositories. CSMR 2011: 5-14 Roberto Minelli, Michele Lanza: Visualizing the workflow of developers. VISSOFT 2013: 1-4 Kostadin Damevski, Hui Chen, David C. Shepherd, Nicholas A. Kraft, Lori L. Pollock: Predicting Future Developer Behavior in the IDE Using Topic Models.  IEEE Trans. Software Eng. 44(11): 1100-1111 (2018) Case studies on developer role and bug lifecycle. Visualization of workflows in an IDE. Predict whether someone continues debugging or starts editing.
  16. 16. 16 Related work discussion on developer workflow analysis II/II Diogo R. Ferreira, Daniel Gillblad: Discovering Process Models from Unlabelled Event Logs.  BPM 2009: 143-158 Niek Tax, Natalia Sidorova, Reinder Haakma, Wil M. P. van der Aalst: Event Abstraction for Process Mining using Supervised Learning Techniques.  CoRR abs/1606.07283 (2016) R. P. Jagadeesh Chandra Bose, Wil M. P. van der Aalst, Indre Zliobaite, Mykola Pechenizkiy: Dealing With Concept Drifts in Process Mining.  IEEE Trans. Neural Networks Learn. Syst. 25(1): 154-171 (2014) Pieter De Koninck, Seppe vanden Broucke, Jochen De Weerdt: act2vec, trace2vec, log2vec, and model2vec: Representation Learning for Business Processes. BPM 2018: 305-321 Discovery of process models with case IDs ML for event abstraction Concept drift refers to the situation in which the process is changing while being analyzed. Representation learning for trace clustering and process model comparison
  17. 17. 17 • Developer workflow analysis • Ownership management • Code review automation Comprehension challenges in ...
  18. 18. 18 Comprehension Challenges in Ownership Management This relates to our work on Ownesty. See the corresponding industry track paper.
  19. 19. 19 What's ownership management? Each asset has the most accountable owner at all times. Software data assets: Hive tables, Pipelines, ML models, Files in repos, ... POC for all means regarding reliability, security, privacy, et al.
  20. 20. 20 Architecture of an Ownesty-style Ownership Recommendation System Metastore Explainable recommendations Assets Extraction Composition Logs Features Interpretable m odels Feature vectors Labeling events Labeled data Labeling Training/ Test Prediction Sugar coating Tooling/ tasks Predictions Extraction Extraction For instance: Employee e queried table t.
  21. 21. 21 Basic challenges in ownership management See the corresponding industry track paper. Challenge Details Ownership decay How to know whether to trust owners on file? Asset subclassing How to identify and handle specific subsets of assets? Team-level ownership How to assign teams as owners with individual signal? Ranking owner candidates What ranking to use to recommend one ore more candidates? Whole/part asset relationships How to obey those relationships with recommendations? Monotonic features How to make sure that more means more likely owner? Explainable recommendations How to explain recommendations to use so that they accept?
  22. 22. 22 • Team level • Split • Merger • Termination • Individual level • Team move • Function change • Hack a month • Types of teams • Oncall rotations • Reporting teams • Organizations • Ad-hoc teams • Types of functions • Engineer • Manager • FTE/STE/intern • Data scientist For instance: Team-level ownership -- consider team changes!
  23. 23. 23 Heterogeneity of Owned Assets • Reviewer recommendation Dependency Awareness • Call graph, variability, package management, build management, traceability recovery, lineage, provenance, ..., feature location, slicing Workflow and Organizational Aspects • Project management, process mining Understandable Recommendations • Interpretable models, explainable recommendations, counterfactuals Open problems and challenges -- some related areas See the industry track paper for details.
  24. 24. 24 Related work discussion on ownership management Yue Yu, Huaimin Wang, Gang Yin, Tao Wang: Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? Inf. Softw. Technol. 74: 204-218 (2016) Vaibhavi Kalgutkar, Ratinder Kaur, Hugo Gonzalez, Natalia Stakhanova, Alina Matyukhina: Code Authorship Attribution: Methods and Challenges. ACM Comput. Surv. 52(1): 3:1-3:36 (2019) Bixin Li, Xiaobing Sun, Hareton Leung, Sai Zhang: A survey of code-based change impact analysis techniques. Softw. Test. Verification Reliab. 23(8): 613-646 (2013) Find the approach with the best recommendation performance A neighboring area related to plagiarism and malware detection Useful for tracking ownership along dependencies!?
  25. 25. 25 • Developer workflow analysis • Ownership management • Code review automation Comprehension challenges in ...
  26. 26. 26 Comprehension Challenges in Code Review Automation
  27. 27. 27 Code review -- all great? The most frequent reasons for confusion are the missing rationale, discussion of non-functional requirements of the solution, and lack of familiarity with existing code. We observe that tools (code review, issue tracker, and version control) and communication issues, such as disagreement or ambiguity in communicative intentions, may also cause confusion during code reviews. Source: Felipe Ebert, Fernando Castor, Nicole Novielli, Alexander Serebrenik: Confusion in Code Reviews: Reasons, Impacts, and Coping Strategies.  SANER 2019: 49-60
  28. 28. 28 Background on code review (automation) I/II Mika Mäntylä, Casper Lassenius: What Types of Defects Are Really Discovered in Code Reviews?  IEEE Trans. Software Eng. 35(3): 430-448 (2009) Devarshi Singh, Varun Ramachandra Sekar, Kathryn T. Stolee, Brittany Johnson: Evaluating how static analysis tools can reduce code review effort.  VL/HCC 2017: 101-105 Laura MacLeod, Michaela Greiler, Margaret-Anne D. Storey, Christian Bird, Jacek Czerwonka: Code Reviewing in the Trenches: Challenges and Best Practices.  IEEE Softw. 35(4): 34-42 (2018) Jacek Czerwonka, Michaela Greiler, Christian Bird, Lucas Panjer, Terry Coatta: CodeFlow: Improving the Code Review Process at Microsoft.  ACM Queue 16(5): 20 (2018) Finding relevant documentation about changes was another frequently reported challenge: 'what it’s doing and how it’s integrated with everything else.' Functional versus evolvability defetcs PMD preempts 16% comments. Another 17% could be implemented. Open fundamental factor: how to avoid code changes with unrelated concerns
  29. 29. 29 Background on code review (automation) II/II Davide Spadini, Fabio Palomba, Tobias Baum, Stefan Hanenberg, Magiel Bruntink, Alberto Bacchelli: Test-driven code review: an empirical study. ICSE 2019: 1061-1072 Eliane Stampfer Wiese, Anna N. Rafferty, Daniel M. Kopta, Jacqulyn M. Anderson: Replicating novices' struggles with coding style. ICPC 2019: 13-18 Fabiano Pecorelli, Fabio Palomba, Dario Di Nucci, Andrea De Lucia: Comparing heuristic and machine learning approaches for metric-based code smell detection. ICPC 2019: 93-104 Evolvability defects may be missed when using test-driven code review Good style is somewhat subjective; sometimes arbitrary. It's difficult to delegate smell detection to ML.
  30. 30. 30 • https://www.phacility.com/phabricator/ • https://www.jetbrains.com/upsource/ • https://aws.amazon.com/codeguru/ • https://www.codacy.com/ • ... Related products
  31. 31. 31 • Signal selection • Code fixes • Comments • Commit summaries • Test plans • Review decisions • Review comments Some automation themes in code review automation Let's look at a few paper representatives.
  32. 32. 32 Signal selection (Code review automation) Mateusz Machalica, Alex Samylkin, Meredith Porth, Satish Chandra: Predictive test selection.  ICSE (SEIP) 2019: 91-100 Reduce infrastructural costs for testing without missing (much) faulty changes
  33. 33. 33 Code fixes (Code review automation) Johannes Bader, Andrew Scott, Michael Pradel, Satish Chandra: Getaﬁx: learning to ﬁx bugs automatically. Proc. ACM Program. Lang. 3(OOPSLA): 159:1-159:27 (2019) Saikat Chakraborty, Miltiadis Allamanis, Baishakhi Ray : CODIT: Code Editing with Tree-Based Neural Machine Translation. https://arxiv.org/abs/1810.00314 (2019) Hussein Alrubaye, Mohamed Wiem Mkaouer, Ali Ouni: On the use of information retrieval to automate the detection of third-party Java library migration at the method level.  ICPC 2019: 347-357 Tree differencing, anti- unification and hierarchical clustering Neural networks instead. Method mapping recovery based on IR appraoch
  34. 34. 34 Comments (Code review automation) Xing Hu, Ge Li, Xin Xia, David Lo, Zhi Jin: Deep code comment generation with hybrid lexical and syntactical information.  Empirical Software Engineering 25(3): 2179-2217 (2020) and (by the same authors): Deep code comment generation.  ICPC 2018: 200-210 Zhai, Juan, Xu, Xiangzhe, Shi, Yu, Tao, Guanhong, Pan, Minxue, Ma, Shiqing, Xu, Lei, Zhang, Weifeng, Tan, Lin Zhang, Xiangyu. (2020). CPC: automatically classifying and propagating natural language comments via program analysis. ICSE 2020. AST to sequences and followed by neural machine translation Scenarios: (i) Generate missing comments (ii) Use comments as assertions
  35. 35. 35 Commit summaries (Code review automation) Jingjing Liang, Yaozong Hou, Shurui Zhou, Junjie Chen, Yingfei Xiong, Gang Huang: How to Explain a Patch: An Empirical Study of Patch Explanations in Open Source Projects. ISSRE 2019: 58-69 Shurui Zhou, Stefan Stanciulescu, Olaf Leßenich, Yingfei Xiong, Andrzej Wasowski, Christian Kästner: Identifying features in forks. ICSE 2018: 105-116 and see also: Luyao Ren, Shurui Zhou, Christian Kästner: Forks insight: providing an overview of GitHub forks. ICSE (Companion Volume) 2018: 179-180 To generate a patch explanation, it is important to first understand how patches were explained. Fork summaries: compute a multi- dimensional dependency graph from the changed code (integrating def-use, control flow, and adjacency), clusters of changed syntax nodes are computed from the graph by community detection -- the resulting clusters are labelled by TF-IDF and friends.
  36. 36. 36 Review decisions (Code review automation) Shu-Ting Shi, Ming Li, David Lo, Ferdian Thung, Xuan Huo: Automatic Code Review by Learning the Revision of Source Code.  AAAI 2019: 4910-4917 Deep learning-based approach which takes into account context for changes
  37. 37. 37 Review comments (Code review automation) Jing Kai Siow, Cuiyun Gao, Lingling Fan, Sen Chen, Yang Liu: CORE: Automating Review Recommendation for Code Changes. SANER 2020: 284-295 Anshul Gupta, Neel Sundaresan: Intelligent code reviews using deep learning. KDD’18 Deep Learning Day, August 2018, London, UK Deep learning / embedding for suggesting review comments for changes NB: There is notable difference between review comment suggestion versus linting based on learned rules! That is, ...
  38. 38. 38 • How to make nitpicking obsolete? • How to assess the reliability of a review? • What is a good commit summary? • What additional info to provide? • What is anomalous code? Challenges in code review automation
  39. 39. 39 Diff • Diff summary • Diff test plan • Commit • CI signal Miscellaneous • Task (bug or feature) • Alert • Incident • Root causing diff Entities involved in code review (at Facebook) How to improve code review automation? Hypothesis -- It needs a combination of these: • Knowledge graph • Change impact analysis • Traceability recovery • Summaries
  40. 40. 40 • Developer workflow analysis, • ownership management, and • code review automation? Any comprehension challenges other than in ... Of course: • Provenance (privacy) • Dependencies (reliability) • ...
  41. 41. 41 Thanks! Let's discuss.

×