SlideShare a Scribd company logo
1 of 41
Download to read offline
Comprehension Challenges at the
Level of Software Ecosystems and
Global Software Engineering
Keynote by Ralf Lämmel, Facebook London
Virtual ICPC 2020, July 2020
2
Infrastructure
Version control, CI, language services, testing automation, ...
Data infrastructure
Storage engines, query engines, pipelines, metastores, ...
AI infrastructure
ML workflows, feature stores, online/offline prediction, ...
Context ➖ Software engineering in infrastructure at Facebook
3
• App development
• Service development
• Internal tool development
• Release management
• Bug / incident tracking
• Language foundation (e.g., Hack + ORM + frameworks + ...)
• Data warehouse (e.g., Hive, Spark, Dataswarm pipelines)
Context ➖ Software ecosystems at Facebook
"A software ecosystem is a collection of software
projects which are developed and which co-
evolve together in the same environment. [...] The
environment can be physical, like in the case of a
company or a research group that has a geo-
spatial identity, but can also be virtual, like the
projects that are part of an open-source
community."
Source: "Reverse Engineering Software
Ecosystems". Dissertation. Mircea F. Lungu.
University of Lugano. 2009.
4
• Engineering in different time zones
• Geographically distributed teams
• Different employee types
• Frequent org / team / role changes
• Diversity and inclusion
Context ➖ Global software engineering at Facebook
"Companies need to use their existing resources as
effectively as possible, and they also need to employ
resources on a global scale from different sites within the
company and from partner companies throughout the world.
This has resulted in global software engineering (GSE) [...]"
Source: "Global software engineering. Challenges and
solutions framework". Dissertation. Päivi Parviainen.
University of Oulu. 2012.
Top topic in (IC)GSE:
• Team
• Project
• Collaboration
• Process
• Communication
• ...
Source: Christof Ebert, Marco Kuhrmann, Rafael Prikladnicki:
Global Software Engineering: Evolution and Trends.
ICGSE 2016: 144-153
5
• Developer workflow analysis
• Ownership management
• Code review automation
Comprehension challenges in ...
It's all about automation -- think heuristics, ML.
Focus on "infrastructure" here -- no apps / user data
Big Data, anyway!
How does program comprehension help to
address these areas? What are the
involved or remaining challenges?
6
Comprehension Challenges in
Developer Workflow Analysis
Let's focus here on work-item prediction.
See the corresponding industry track paper.
7
Scenarios of work-item prediction I/II
The ‘Incident Response’ Scenario:

• Work item: Alert for suboptimal performance

• Question: The workflow steps to follow in response

• Automation: Record steps in past instances

• Challenge: To know when someone is responding
8
Scenarios of work-item prediction II/II
The ‘Aggregate Performance’ Scenario:

• Work item: A diff (a system change)

• Question: Time spent on diff

• Automation: Record all activities on diff

• Challenge: To know when someone is working on the diff
9
Dark matter in developer workflow analysis
Facebook Inc.
ent
hal-
lex
iza-
ely
ely
not
nci-
ain
nds
Time line of a developer
Query
DB
interactivelyCom
m
ita
version
locally
Read
docum
entation
Publish
a
diffPublish
a
diff
Review
a
diff
The events on the timeline concern dierent ‘dis’ (i.e., system changes
all the way from committing a change locally to landing the change in
production) as work items. White events are trivially associated with dis.
Gray events require dedicated data integration for association. Black events
are hard to associate; advanced heuristics and machine learning may be of
10
Probabilistic work-item prediction
Facebook Inc.
ent
hal-
lex
iza-
ely
ely
not
nci-
ain
nds
Time line of a developer
Query
DB
interactivelyCom
m
ita
version
locally
Read
docum
entation
Publish
a
diffPublish
a
diff
Review
a
diff
The events on the timeline concern dierent ‘dis’ (i.e., system changes
all the way from committing a change locally to landing the change in
production) as work items. White events are trivially associated with dis.
Gray events require dedicated data integration for association. Black events
are hard to associate; advanced heuristics and machine learning may be of
1.0
.8
.3 .5
.1
11
• Tools don't track work items consistently.
• Tools aren't fully integrated.
• Logging is not designed with workflow analysis in mind.
• Developer workflow is somewhat unstructured.
• Developers engage in a lot of context switching.
• ...
Why do we have dark matter?
Also known elsewhere as:
Sukriti Goel, Jyoti M. Bhat, and Barbara Weber. 2013.
End-to-End Process Extraction in Process Unaware Systems.
In Business Process Management Workshops -
BPM 2012 International Workshops. Revised Papers (Lecture Notes in
Business Information Processing), Vol132. Springer, 162–173.
12
• Tools ➖ added, obsoleted, removed
• Tool functionality ➖ added, removed, revised (new version)
• Interface ➖ added or removed form, revised schema or semantics
• Integration with other tools or into suites evolves
• Logging ➖ schema or semantics evolves
• Best practices and use cases evolve
For instance: consider aspects of tooling!
We need automation (ML 
heuristics) -- reverse and re-
engineering doesn't scale!
13
Context switching in development
Figure 2: Number of (selected) tools used per employee on a given day for many of Facebook’s employees.
Figure 3: Concurrent workow by a developer on several dis (y-axis) over a few days (x-axis).
We need more than time
proximity and high-
confidence events.
14
A system for diff prediction
See the corresponding industry track paper.
System component / notion Explanation
Logging foundation
Integrate all available logs: 

version control, continuous integration, 

CLI, internal web-based tools, ...
Time windows into
dark matter
Use windows of 10 minutes.

Wanted: the probability of the employee working on a diff.
Candidate work items Anything the employee may have possibly worked on
Probabilistc ranking based
on ML / heuristics
High confidence, e.g., employee submitted diff revision

Low confidence, e.g., employee queried table mentioned in diff
15
Related work discussion on developer workflow analysis I/II
Wouter Poncin, Alexander Serebrenik, Mark van den Brand:
Process Mining Software Repositories. CSMR 2011: 5-14
Roberto Minelli, Michele Lanza: Visualizing the workflow of
developers. VISSOFT 2013: 1-4
Kostadin Damevski, Hui Chen, David C. Shepherd, Nicholas A.
Kraft, Lori L. Pollock: Predicting Future Developer
Behavior in the IDE Using Topic Models.  IEEE Trans.
Software Eng. 44(11): 1100-1111 (2018)
Case studies on developer
role and bug lifecycle.
Visualization of workflows
in an IDE.
Predict whether someone
continues debugging or
starts editing.
16
Related work discussion on developer workflow analysis II/II
Diogo R. Ferreira, Daniel Gillblad: Discovering Process Models
from Unlabelled Event Logs.  BPM 2009: 143-158
Niek Tax, Natalia Sidorova, Reinder Haakma, Wil M. P. van der Aalst:
Event Abstraction for Process Mining using Supervised
Learning Techniques.  CoRR abs/1606.07283 (2016)
R. P. Jagadeesh Chandra Bose, Wil M. P. van der Aalst, Indre
Zliobaite, Mykola Pechenizkiy: Dealing With Concept Drifts in
Process Mining.  IEEE Trans. Neural Networks Learn.
Syst. 25(1): 154-171 (2014)
Pieter De Koninck, Seppe vanden Broucke, Jochen De Weerdt:
act2vec, trace2vec, log2vec, and model2vec:
Representation Learning for Business
Processes. BPM 2018: 305-321
Discovery of process
models with case IDs
ML for event abstraction
Concept drift refers to the
situation in which the
process is changing while
being analyzed.
Representation learning
for trace clustering and
process model comparison
17
• Developer workflow analysis
• Ownership management
• Code review automation
Comprehension challenges in ...
18
Comprehension Challenges in
Ownership Management
This relates to our work on Ownesty.
See the corresponding industry track paper.
19
What's ownership management?
Each asset has the most accountable owner at all times.
Software  data assets:
Hive tables,
Pipelines,
ML models,
Files in repos,
...
POC for all means regarding
reliability,
security,
privacy,
et al.
20
Architecture of an
Ownesty-style Ownership Recommendation System
Metastore Explainable
recommendations
Assets
Extraction Composition
Logs
Features
Interpretable
m
odels
Feature
vectors
Labeling
events
Labeled
data
Labeling
Training/
Test
Prediction
Sugar
coating
Tooling/
tasks
Predictions
Extraction
Extraction
For instance:
Employee e
queried table t.
21
Basic challenges in ownership management
See the corresponding industry track paper.
Challenge Details
Ownership decay How to know whether to trust owners on file?
Asset subclassing How to identify and handle specific subsets of assets?
Team-level ownership How to assign teams as owners with individual signal?
Ranking owner candidates What ranking to use to recommend one ore more candidates?
Whole/part asset relationships How to obey those relationships with recommendations?
Monotonic features How to make sure that more means more likely owner?
Explainable recommendations How to explain recommendations to use so that they accept?
22
• Team level
• Split
• Merger
• Termination
• Individual level
• Team move
• Function change
• Hack a month
• Types of teams
• Oncall rotations
• Reporting teams
• Organizations
• Ad-hoc teams
• Types of functions
• Engineer
• Manager
• FTE/STE/intern
• Data scientist
For instance: Team-level ownership -- consider team changes!
23
Heterogeneity of Owned Assets
• Reviewer recommendation
Dependency Awareness
• Call graph, variability, package management, build management,
traceability recovery, lineage, provenance, ..., feature location, slicing
Workflow and Organizational Aspects
• Project management, process mining
Understandable Recommendations
• Interpretable models, explainable recommendations, counterfactuals
Open problems and challenges -- some related areas
See the industry track paper for details.
24
Related work discussion on ownership management
Yue Yu, Huaimin Wang, Gang Yin, Tao Wang:
Reviewer recommendation for pull-requests in GitHub:
What can we learn from code review and bug
assignment? Inf. Softw. Technol. 74: 204-218 (2016)
Vaibhavi Kalgutkar, Ratinder Kaur, Hugo Gonzalez, Natalia
Stakhanova, Alina Matyukhina: Code Authorship Attribution:
Methods and Challenges. ACM Comput.
Surv. 52(1): 3:1-3:36 (2019)
Bixin Li, Xiaobing Sun, Hareton Leung, Sai Zhang: A survey of
code-based change impact analysis techniques. Softw. Test.
Verification Reliab. 23(8): 613-646 (2013)
Find the approach with
the best recommendation
performance
A neighboring area
related to plagiarism and
malware detection
Useful for tracking
ownership along
dependencies!?
25
• Developer workflow analysis
• Ownership management
• Code review automation
Comprehension challenges in ...
26
Comprehension Challenges in
Code Review Automation
27
Code review -- all great?
The most frequent reasons for confusion are the missing rationale,
discussion of non-functional requirements of the solution, and
lack of familiarity with existing code. We observe that tools (code
review, issue tracker, and version control) and communication issues,
such as disagreement or ambiguity in communicative intentions, may also
cause confusion during code reviews.
Source:
Felipe Ebert, Fernando Castor, Nicole Novielli, Alexander Serebrenik:
Confusion in Code Reviews: Reasons, Impacts, and Coping Strategies. 
SANER 2019: 49-60
28
Background on code review (automation) I/II
Mika Mäntylä, Casper Lassenius: What Types of Defects Are Really
Discovered in Code Reviews?  IEEE Trans. Software
Eng. 35(3): 430-448 (2009)
Devarshi Singh, Varun Ramachandra Sekar, Kathryn T.
Stolee, Brittany Johnson: Evaluating how static analysis tools can
reduce code review effort.  VL/HCC 2017: 101-105
Laura MacLeod, Michaela Greiler, Margaret-Anne D.
Storey, Christian Bird, Jacek Czerwonka: Code Reviewing in the
Trenches: Challenges and Best Practices.  IEEE
Softw. 35(4): 34-42 (2018)
Jacek Czerwonka, Michaela Greiler, Christian Bird, Lucas
Panjer, Terry Coatta: CodeFlow: Improving the Code Review
Process at Microsoft.  ACM Queue 16(5): 20 (2018)
Finding relevant documentation
about changes was another
frequently reported challenge:
'what it’s doing and how it’s
integrated with everything else.'
Functional
versus
evolvability defetcs
PMD preempts 16%
comments. Another 17%
could be implemented.
Open fundamental factor:
how to avoid code changes
with unrelated concerns
29
Background on code review (automation) II/II
Davide Spadini, Fabio Palomba, Tobias Baum, Stefan
Hanenberg, Magiel Bruntink, Alberto Bacchelli: Test-driven code
review: an empirical study. ICSE 2019: 1061-1072
Eliane Stampfer Wiese, Anna N. Rafferty, Daniel M.
Kopta, Jacqulyn M. Anderson: Replicating novices' struggles
with coding style. ICPC 2019: 13-18
Fabiano Pecorelli, Fabio Palomba, Dario Di Nucci, Andrea De Lucia:
Comparing heuristic and machine learning approaches for
metric-based code smell detection. ICPC 2019: 93-104
Evolvability defects may
be missed when using
test-driven code review
Good style is somewhat
subjective; sometimes
arbitrary.
It's difficult to delegate
smell detection to ML.
30
• https://www.phacility.com/phabricator/
• https://www.jetbrains.com/upsource/
• https://aws.amazon.com/codeguru/
• https://www.codacy.com/
• ...
Related products
31
• Signal selection
• Code fixes
• Comments
• Commit summaries
• Test plans
• Review decisions
• Review comments
Some automation themes in code review automation
Let's look at a few paper representatives.
32
Signal selection
(Code review automation)
Mateusz Machalica, Alex Samylkin, Meredith Porth, Satish
Chandra: Predictive test selection.  ICSE
(SEIP) 2019: 91-100
Reduce infrastructural costs
for testing without missing
(much) faulty changes
33
Code fixes
(Code review automation)
Johannes Bader, Andrew Scott, Michael Pradel, Satish Chandra:
Getafix: learning to fix bugs automatically. Proc. ACM
Program. Lang. 3(OOPSLA): 159:1-159:27 (2019)
Saikat Chakraborty, Miltiadis Allamanis, Baishakhi Ray : CODIT:
Code Editing with Tree-Based Neural Machine
Translation. https://arxiv.org/abs/1810.00314 (2019)
Hussein Alrubaye, Mohamed Wiem Mkaouer, Ali Ouni: On the use
of information retrieval to automate the detection of
third-party Java library migration at the method level. 
ICPC 2019: 347-357
Tree differencing, anti-
unification and hierarchical
clustering
Neural networks instead.
Method mapping recovery
based on IR appraoch
34
Comments
(Code review automation)
Xing Hu, Ge Li, Xin Xia, David Lo, Zhi Jin: Deep code comment
generation with hybrid lexical and syntactical
information.  Empirical Software
Engineering 25(3): 2179-2217 (2020) and (by the same authors):
Deep code comment generation.  ICPC 2018: 200-210
Zhai, Juan, Xu, Xiangzhe, Shi, Yu, Tao, Guanhong, Pan, Minxue, Ma,
Shiqing, Xu, Lei, Zhang, Weifeng, Tan, Lin  Zhang, Xiangyu. (2020).
CPC: automatically classifying and propagating natural
language comments via program analysis. ICSE 2020.
AST to sequences
and followed by
neural machine translation
Scenarios:
(i) Generate missing comments
(ii) Use comments as assertions
35
Commit summaries
(Code review automation)
Jingjing Liang, Yaozong Hou, Shurui Zhou, Junjie Chen,
Yingfei Xiong, Gang Huang: How to Explain a Patch:
An Empirical Study of Patch Explanations in Open
Source Projects. ISSRE 2019: 58-69
Shurui Zhou, Stefan Stanciulescu, Olaf Leßenich, Yingfei
Xiong, Andrzej Wasowski, Christian Kästner: Identifying
features in forks. ICSE 2018: 105-116 and see also: Luyao
Ren, Shurui Zhou, Christian Kästner: Forks insight:
providing an overview of GitHub forks. ICSE
(Companion Volume) 2018: 179-180
To generate a patch explanation, it is
important to first understand how
patches were explained.
Fork summaries: compute a multi-
dimensional dependency graph from the
changed code (integrating def-use, control
flow, and adjacency), clusters of changed
syntax nodes are computed from the graph by
community detection -- the resulting clusters
are labelled by TF-IDF and friends.
36
Review decisions
(Code review automation)
Shu-Ting Shi, Ming Li, David Lo, Ferdian Thung, Xuan Huo:
Automatic Code Review by Learning the Revision of
Source Code.  AAAI 2019: 4910-4917
Deep learning-based
approach which takes into
account context for changes
37
Review comments
(Code review automation)
Jing Kai Siow, Cuiyun Gao, Lingling Fan, Sen Chen, Yang Liu:
CORE: Automating Review Recommendation for
Code Changes. SANER 2020: 284-295
Anshul Gupta, Neel Sundaresan: Intelligent code reviews
using deep learning. KDD’18 Deep Learning Day, August
2018, London, UK
Deep learning / embedding
for suggesting review
comments for changes
NB: There is notable difference between review comment
suggestion versus linting based on learned rules! That is, ...
38
• How to make nitpicking obsolete?
• How to assess the reliability of a review?
• What is a good commit summary?
• What additional info to provide?
• What is anomalous code?
Challenges in code review automation
39
Diff
• Diff summary
• Diff test plan
• Commit
• CI signal
Miscellaneous
• Task (bug or feature)
• Alert
• Incident
• Root causing diff
Entities involved in code review (at Facebook)
How to improve code review automation?
Hypothesis -- It needs a combination of these:
• Knowledge graph
• Change impact analysis
• Traceability recovery
• Summaries
40
• Developer workflow analysis,
• ownership management, and
• code review automation?
Any comprehension challenges other than in ...
Of course:
• Provenance (privacy)
• Dependencies (reliability)
• ...
41
Thanks!
Let's discuss.

More Related Content

What's hot

Software Engineering - chp5- software architecture
Software Engineering - chp5- software architectureSoftware Engineering - chp5- software architecture
Software Engineering - chp5- software architectureLilia Sfaxi
 
Innoslate, A Model-Based Systems Engineering Tool
Innoslate, A Model-Based Systems Engineering ToolInnoslate, A Model-Based Systems Engineering Tool
Innoslate, A Model-Based Systems Engineering ToolElizabeth Steiner
 
Data Designs (Software Engg.)
Data Designs (Software Engg.)Data Designs (Software Engg.)
Data Designs (Software Engg.)Arun Shukla
 
9 requirements engineering2
9 requirements engineering29 requirements engineering2
9 requirements engineering2Lilia Sfaxi
 
Software Architecture and Design
Software Architecture and DesignSoftware Architecture and Design
Software Architecture and DesignRa'Fat Al-Msie'deen
 
unit 5 Architectural design
 unit 5 Architectural design unit 5 Architectural design
unit 5 Architectural designdevika g
 
Software Architecture and Design Introduction
Software Architecture and Design IntroductionSoftware Architecture and Design Introduction
Software Architecture and Design IntroductionUsman Khan
 
Software architecture for developers by Simon Brown
Software architecture for developers by Simon BrownSoftware architecture for developers by Simon Brown
Software architecture for developers by Simon BrownCodemotion
 
Formal approaches to software architecture design thesis presentation
Formal approaches to software architecture design   thesis presentationFormal approaches to software architecture design   thesis presentation
Formal approaches to software architecture design thesis presentationNacha Chondamrongkul
 
Software design, software engineering
Software design, software engineeringSoftware design, software engineering
Software design, software engineeringRupesh Vaishnav
 
Software Architecture: introduction to the abstraction
Software Architecture: introduction to the abstractionSoftware Architecture: introduction to the abstraction
Software Architecture: introduction to the abstractionHenry Muccini
 
Enabling high level application development for internet of things
Enabling high level application development for internet of thingsEnabling high level application development for internet of things
Enabling high level application development for internet of thingsPankesh Patel
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolKellyton Brito
 
Reverse Engineering Web Applications
Reverse Engineering Web ApplicationsReverse Engineering Web Applications
Reverse Engineering Web ApplicationsPorfirio Tramontana
 
Oose unit 1 ppt
Oose unit 1 pptOose unit 1 ppt
Oose unit 1 pptDr VISU P
 
Cs 1023 lec 1 big idea (week 1)
Cs 1023 lec 1   big idea (week 1)Cs 1023 lec 1   big idea (week 1)
Cs 1023 lec 1 big idea (week 1)stanbridge
 

What's hot (19)

Software Engineering - chp5- software architecture
Software Engineering - chp5- software architectureSoftware Engineering - chp5- software architecture
Software Engineering - chp5- software architecture
 
Innoslate Overview
Innoslate OverviewInnoslate Overview
Innoslate Overview
 
Innoslate, A Model-Based Systems Engineering Tool
Innoslate, A Model-Based Systems Engineering ToolInnoslate, A Model-Based Systems Engineering Tool
Innoslate, A Model-Based Systems Engineering Tool
 
Data Designs (Software Engg.)
Data Designs (Software Engg.)Data Designs (Software Engg.)
Data Designs (Software Engg.)
 
9 requirements engineering2
9 requirements engineering29 requirements engineering2
9 requirements engineering2
 
Software Architecture and Design
Software Architecture and DesignSoftware Architecture and Design
Software Architecture and Design
 
unit 5 Architectural design
 unit 5 Architectural design unit 5 Architectural design
unit 5 Architectural design
 
Software Architecture and Design Introduction
Software Architecture and Design IntroductionSoftware Architecture and Design Introduction
Software Architecture and Design Introduction
 
Software architecture for developers by Simon Brown
Software architecture for developers by Simon BrownSoftware architecture for developers by Simon Brown
Software architecture for developers by Simon Brown
 
Formal approaches to software architecture design thesis presentation
Formal approaches to software architecture design   thesis presentationFormal approaches to software architecture design   thesis presentation
Formal approaches to software architecture design thesis presentation
 
Software design, software engineering
Software design, software engineeringSoftware design, software engineering
Software design, software engineering
 
Software Architecture: introduction to the abstraction
Software Architecture: introduction to the abstractionSoftware Architecture: introduction to the abstraction
Software Architecture: introduction to the abstraction
 
Enabling high level application development for internet of things
Enabling high level application development for internet of thingsEnabling high level application development for internet of things
Enabling high level application development for internet of things
 
LIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval ToolLIFT: A Legacy InFormation retrieval Tool
LIFT: A Legacy InFormation retrieval Tool
 
Reverse Engineering Web Applications
Reverse Engineering Web ApplicationsReverse Engineering Web Applications
Reverse Engineering Web Applications
 
Tg06
Tg06Tg06
Tg06
 
Software Design Concepts
Software Design ConceptsSoftware Design Concepts
Software Design Concepts
 
Oose unit 1 ppt
Oose unit 1 pptOose unit 1 ppt
Oose unit 1 ppt
 
Cs 1023 lec 1 big idea (week 1)
Cs 1023 lec 1   big idea (week 1)Cs 1023 lec 1   big idea (week 1)
Cs 1023 lec 1 big idea (week 1)
 

Similar to Keynote at-icpc-2020

Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataTrieu Nguyen
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionFlorian Wilhelm
 
Seminar VU Amsterdam 2015
Seminar VU Amsterdam 2015Seminar VU Amsterdam 2015
Seminar VU Amsterdam 2015Philipp Leitner
 
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringTao Xie
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
[2015/2016] Software systems engineering PRINCIPLES
[2015/2016] Software systems engineering PRINCIPLES[2015/2016] Software systems engineering PRINCIPLES
[2015/2016] Software systems engineering PRINCIPLESIvano Malavolta
 
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...DataScienceConferenc1
 
Studying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning SystemsStudying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning SystemsHironori Washizaki
 
Managing Agile Software Development Projects
Managing Agile Software Development ProjectsManaging Agile Software Development Projects
Managing Agile Software Development ProjectsMartina Šimičić
 

Similar to Keynote at-icpc-2020 (20)

Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
OOP ppt.pdf
OOP ppt.pdfOOP ppt.pdf
OOP ppt.pdf
 
Seminar VU Amsterdam 2015
Seminar VU Amsterdam 2015Seminar VU Amsterdam 2015
Seminar VU Amsterdam 2015
 
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Task Complexity Metrics - Ben Colborn
Task Complexity Metrics - Ben ColbornTask Complexity Metrics - Ben Colborn
Task Complexity Metrics - Ben Colborn
 
[2015/2016] Software systems engineering PRINCIPLES
[2015/2016] Software systems engineering PRINCIPLES[2015/2016] Software systems engineering PRINCIPLES
[2015/2016] Software systems engineering PRINCIPLES
 
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
[DSC Europe 23] Igor Ilic - Redefining User Experience with Large Language Mo...
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Studying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning SystemsStudying Software Engineering Patterns for Designing Machine Learning Systems
Studying Software Engineering Patterns for Designing Machine Learning Systems
 
Managing Agile Software Development Projects
Managing Agile Software Development ProjectsManaging Agile Software Development Projects
Managing Agile Software Development Projects
 

More from Ralf Laemmel

Functional data structures
Functional data structuresFunctional data structures
Functional data structuresRalf Laemmel
 
Modeling software systems at a macroscopic scale
Modeling software systems  at a macroscopic scaleModeling software systems  at a macroscopic scale
Modeling software systems at a macroscopic scaleRalf Laemmel
 
An introduction on language processing
An introduction on language processingAn introduction on language processing
An introduction on language processingRalf Laemmel
 
Surfacing ‘101’ in a Linked Data manner as presented at SATToSE 2013
Surfacing ‘101’ in a Linked Data manner as presented at SATToSE 2013Surfacing ‘101’ in a Linked Data manner as presented at SATToSE 2013
Surfacing ‘101’ in a Linked Data manner as presented at SATToSE 2013Ralf Laemmel
 
Remote method invocation (as part of the the PTT lecture)
Remote method invocation (as part of the the PTT lecture)Remote method invocation (as part of the the PTT lecture)
Remote method invocation (as part of the the PTT lecture)Ralf Laemmel
 
Database programming including O/R mapping (as part of the the PTT lecture)
Database programming including O/R mapping (as part of the the PTT lecture)Database programming including O/R mapping (as part of the the PTT lecture)
Database programming including O/R mapping (as part of the the PTT lecture)Ralf Laemmel
 
Aspect-oriented programming with AspectJ (as part of the the PTT lecture)
Aspect-oriented programming with AspectJ (as part of the the PTT lecture)Aspect-oriented programming with AspectJ (as part of the the PTT lecture)
Aspect-oriented programming with AspectJ (as part of the the PTT lecture)Ralf Laemmel
 
Multithreaded programming (as part of the the PTT lecture)
Multithreaded programming (as part of the the PTT lecture)Multithreaded programming (as part of the the PTT lecture)
Multithreaded programming (as part of the the PTT lecture)Ralf Laemmel
 
Functional OO programming (as part of the the PTT lecture)
Functional OO programming (as part of the the PTT lecture)Functional OO programming (as part of the the PTT lecture)
Functional OO programming (as part of the the PTT lecture)Ralf Laemmel
 
Metaprograms and metadata (as part of the the PTT lecture)
Metaprograms and metadata (as part of the the PTT lecture)Metaprograms and metadata (as part of the the PTT lecture)
Metaprograms and metadata (as part of the the PTT lecture)Ralf Laemmel
 
Generative programming (mostly parser generation)
Generative programming (mostly parser generation)Generative programming (mostly parser generation)
Generative programming (mostly parser generation)Ralf Laemmel
 
Language processing patterns
Language processing patternsLanguage processing patterns
Language processing patternsRalf Laemmel
 
The Expression Problem (as part of the the PTT lecture)
The Expression Problem (as part of the the PTT lecture)The Expression Problem (as part of the the PTT lecture)
The Expression Problem (as part of the the PTT lecture)Ralf Laemmel
 
Selected design patterns (as part of the the PTT lecture)
Selected design patterns (as part of the the PTT lecture)Selected design patterns (as part of the the PTT lecture)
Selected design patterns (as part of the the PTT lecture)Ralf Laemmel
 

More from Ralf Laemmel (15)

Functional data structures
Functional data structuresFunctional data structures
Functional data structures
 
Modeling software systems at a macroscopic scale
Modeling software systems  at a macroscopic scaleModeling software systems  at a macroscopic scale
Modeling software systems at a macroscopic scale
 
An introduction on language processing
An introduction on language processingAn introduction on language processing
An introduction on language processing
 
Surfacing ‘101’ in a Linked Data manner as presented at SATToSE 2013
Surfacing ‘101’ in a Linked Data manner as presented at SATToSE 2013Surfacing ‘101’ in a Linked Data manner as presented at SATToSE 2013
Surfacing ‘101’ in a Linked Data manner as presented at SATToSE 2013
 
Remote method invocation (as part of the the PTT lecture)
Remote method invocation (as part of the the PTT lecture)Remote method invocation (as part of the the PTT lecture)
Remote method invocation (as part of the the PTT lecture)
 
Database programming including O/R mapping (as part of the the PTT lecture)
Database programming including O/R mapping (as part of the the PTT lecture)Database programming including O/R mapping (as part of the the PTT lecture)
Database programming including O/R mapping (as part of the the PTT lecture)
 
Aspect-oriented programming with AspectJ (as part of the the PTT lecture)
Aspect-oriented programming with AspectJ (as part of the the PTT lecture)Aspect-oriented programming with AspectJ (as part of the the PTT lecture)
Aspect-oriented programming with AspectJ (as part of the the PTT lecture)
 
Multithreaded programming (as part of the the PTT lecture)
Multithreaded programming (as part of the the PTT lecture)Multithreaded programming (as part of the the PTT lecture)
Multithreaded programming (as part of the the PTT lecture)
 
Functional OO programming (as part of the the PTT lecture)
Functional OO programming (as part of the the PTT lecture)Functional OO programming (as part of the the PTT lecture)
Functional OO programming (as part of the the PTT lecture)
 
Metaprograms and metadata (as part of the the PTT lecture)
Metaprograms and metadata (as part of the the PTT lecture)Metaprograms and metadata (as part of the the PTT lecture)
Metaprograms and metadata (as part of the the PTT lecture)
 
Generative programming (mostly parser generation)
Generative programming (mostly parser generation)Generative programming (mostly parser generation)
Generative programming (mostly parser generation)
 
Language processing patterns
Language processing patternsLanguage processing patterns
Language processing patterns
 
The Expression Problem (as part of the the PTT lecture)
The Expression Problem (as part of the the PTT lecture)The Expression Problem (as part of the the PTT lecture)
The Expression Problem (as part of the the PTT lecture)
 
XML data binding
XML data bindingXML data binding
XML data binding
 
Selected design patterns (as part of the the PTT lecture)
Selected design patterns (as part of the the PTT lecture)Selected design patterns (as part of the the PTT lecture)
Selected design patterns (as part of the the PTT lecture)
 

Recently uploaded

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 

Recently uploaded (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 

Keynote at-icpc-2020

  • 1. Comprehension Challenges at the Level of Software Ecosystems and Global Software Engineering Keynote by Ralf Lämmel, Facebook London Virtual ICPC 2020, July 2020
  • 2. 2 Infrastructure Version control, CI, language services, testing automation, ... Data infrastructure Storage engines, query engines, pipelines, metastores, ... AI infrastructure ML workflows, feature stores, online/offline prediction, ... Context ➖ Software engineering in infrastructure at Facebook
  • 3. 3 • App development • Service development • Internal tool development • Release management • Bug / incident tracking • Language foundation (e.g., Hack + ORM + frameworks + ...) • Data warehouse (e.g., Hive, Spark, Dataswarm pipelines) Context ➖ Software ecosystems at Facebook "A software ecosystem is a collection of software projects which are developed and which co- evolve together in the same environment. [...] The environment can be physical, like in the case of a company or a research group that has a geo- spatial identity, but can also be virtual, like the projects that are part of an open-source community." Source: "Reverse Engineering Software Ecosystems". Dissertation. Mircea F. Lungu. University of Lugano. 2009.
  • 4. 4 • Engineering in different time zones • Geographically distributed teams • Different employee types • Frequent org / team / role changes • Diversity and inclusion Context ➖ Global software engineering at Facebook "Companies need to use their existing resources as effectively as possible, and they also need to employ resources on a global scale from different sites within the company and from partner companies throughout the world. This has resulted in global software engineering (GSE) [...]" Source: "Global software engineering. Challenges and solutions framework". Dissertation. Päivi Parviainen. University of Oulu. 2012. Top topic in (IC)GSE: • Team • Project • Collaboration • Process • Communication • ... Source: Christof Ebert, Marco Kuhrmann, Rafael Prikladnicki: Global Software Engineering: Evolution and Trends. ICGSE 2016: 144-153
  • 5. 5 • Developer workflow analysis • Ownership management • Code review automation Comprehension challenges in ... It's all about automation -- think heuristics, ML. Focus on "infrastructure" here -- no apps / user data Big Data, anyway! How does program comprehension help to address these areas? What are the involved or remaining challenges?
  • 6. 6 Comprehension Challenges in Developer Workflow Analysis Let's focus here on work-item prediction. See the corresponding industry track paper.
  • 7. 7 Scenarios of work-item prediction I/II The ‘Incident Response’ Scenario: • Work item: Alert for suboptimal performance • Question: The workflow steps to follow in response • Automation: Record steps in past instances • Challenge: To know when someone is responding
  • 8. 8 Scenarios of work-item prediction II/II The ‘Aggregate Performance’ Scenario: • Work item: A diff (a system change) • Question: Time spent on diff • Automation: Record all activities on diff • Challenge: To know when someone is working on the diff
  • 9. 9 Dark matter in developer workflow analysis Facebook Inc. ent hal- lex iza- ely ely not nci- ain nds Time line of a developer Query DB interactivelyCom m ita version locally Read docum entation Publish a diffPublish a diff Review a diff The events on the timeline concern dierent ‘dis’ (i.e., system changes all the way from committing a change locally to landing the change in production) as work items. White events are trivially associated with dis. Gray events require dedicated data integration for association. Black events are hard to associate; advanced heuristics and machine learning may be of
  • 10. 10 Probabilistic work-item prediction Facebook Inc. ent hal- lex iza- ely ely not nci- ain nds Time line of a developer Query DB interactivelyCom m ita version locally Read docum entation Publish a diffPublish a diff Review a diff The events on the timeline concern dierent ‘dis’ (i.e., system changes all the way from committing a change locally to landing the change in production) as work items. White events are trivially associated with dis. Gray events require dedicated data integration for association. Black events are hard to associate; advanced heuristics and machine learning may be of 1.0 .8 .3 .5 .1
  • 11. 11 • Tools don't track work items consistently. • Tools aren't fully integrated. • Logging is not designed with workflow analysis in mind. • Developer workflow is somewhat unstructured. • Developers engage in a lot of context switching. • ... Why do we have dark matter? Also known elsewhere as: Sukriti Goel, Jyoti M. Bhat, and Barbara Weber. 2013. End-to-End Process Extraction in Process Unaware Systems. In Business Process Management Workshops - BPM 2012 International Workshops. Revised Papers (Lecture Notes in Business Information Processing), Vol132. Springer, 162–173.
  • 12. 12 • Tools ➖ added, obsoleted, removed • Tool functionality ➖ added, removed, revised (new version) • Interface ➖ added or removed form, revised schema or semantics • Integration with other tools or into suites evolves • Logging ➖ schema or semantics evolves • Best practices and use cases evolve For instance: consider aspects of tooling! We need automation (ML heuristics) -- reverse and re- engineering doesn't scale!
  • 13. 13 Context switching in development Figure 2: Number of (selected) tools used per employee on a given day for many of Facebook’s employees. Figure 3: Concurrent workow by a developer on several dis (y-axis) over a few days (x-axis). We need more than time proximity and high- confidence events.
  • 14. 14 A system for diff prediction See the corresponding industry track paper. System component / notion Explanation Logging foundation Integrate all available logs: version control, continuous integration, CLI, internal web-based tools, ... Time windows into dark matter Use windows of 10 minutes. Wanted: the probability of the employee working on a diff. Candidate work items Anything the employee may have possibly worked on Probabilistc ranking based on ML / heuristics High confidence, e.g., employee submitted diff revision Low confidence, e.g., employee queried table mentioned in diff
  • 15. 15 Related work discussion on developer workflow analysis I/II Wouter Poncin, Alexander Serebrenik, Mark van den Brand: Process Mining Software Repositories. CSMR 2011: 5-14 Roberto Minelli, Michele Lanza: Visualizing the workflow of developers. VISSOFT 2013: 1-4 Kostadin Damevski, Hui Chen, David C. Shepherd, Nicholas A. Kraft, Lori L. Pollock: Predicting Future Developer Behavior in the IDE Using Topic Models.  IEEE Trans. Software Eng. 44(11): 1100-1111 (2018) Case studies on developer role and bug lifecycle. Visualization of workflows in an IDE. Predict whether someone continues debugging or starts editing.
  • 16. 16 Related work discussion on developer workflow analysis II/II Diogo R. Ferreira, Daniel Gillblad: Discovering Process Models from Unlabelled Event Logs.  BPM 2009: 143-158 Niek Tax, Natalia Sidorova, Reinder Haakma, Wil M. P. van der Aalst: Event Abstraction for Process Mining using Supervised Learning Techniques.  CoRR abs/1606.07283 (2016) R. P. Jagadeesh Chandra Bose, Wil M. P. van der Aalst, Indre Zliobaite, Mykola Pechenizkiy: Dealing With Concept Drifts in Process Mining.  IEEE Trans. Neural Networks Learn. Syst. 25(1): 154-171 (2014) Pieter De Koninck, Seppe vanden Broucke, Jochen De Weerdt: act2vec, trace2vec, log2vec, and model2vec: Representation Learning for Business Processes. BPM 2018: 305-321 Discovery of process models with case IDs ML for event abstraction Concept drift refers to the situation in which the process is changing while being analyzed. Representation learning for trace clustering and process model comparison
  • 17. 17 • Developer workflow analysis • Ownership management • Code review automation Comprehension challenges in ...
  • 18. 18 Comprehension Challenges in Ownership Management This relates to our work on Ownesty. See the corresponding industry track paper.
  • 19. 19 What's ownership management? Each asset has the most accountable owner at all times. Software data assets: Hive tables, Pipelines, ML models, Files in repos, ... POC for all means regarding reliability, security, privacy, et al.
  • 20. 20 Architecture of an Ownesty-style Ownership Recommendation System Metastore Explainable recommendations Assets Extraction Composition Logs Features Interpretable m odels Feature vectors Labeling events Labeled data Labeling Training/ Test Prediction Sugar coating Tooling/ tasks Predictions Extraction Extraction For instance: Employee e queried table t.
  • 21. 21 Basic challenges in ownership management See the corresponding industry track paper. Challenge Details Ownership decay How to know whether to trust owners on file? Asset subclassing How to identify and handle specific subsets of assets? Team-level ownership How to assign teams as owners with individual signal? Ranking owner candidates What ranking to use to recommend one ore more candidates? Whole/part asset relationships How to obey those relationships with recommendations? Monotonic features How to make sure that more means more likely owner? Explainable recommendations How to explain recommendations to use so that they accept?
  • 22. 22 • Team level • Split • Merger • Termination • Individual level • Team move • Function change • Hack a month • Types of teams • Oncall rotations • Reporting teams • Organizations • Ad-hoc teams • Types of functions • Engineer • Manager • FTE/STE/intern • Data scientist For instance: Team-level ownership -- consider team changes!
  • 23. 23 Heterogeneity of Owned Assets • Reviewer recommendation Dependency Awareness • Call graph, variability, package management, build management, traceability recovery, lineage, provenance, ..., feature location, slicing Workflow and Organizational Aspects • Project management, process mining Understandable Recommendations • Interpretable models, explainable recommendations, counterfactuals Open problems and challenges -- some related areas See the industry track paper for details.
  • 24. 24 Related work discussion on ownership management Yue Yu, Huaimin Wang, Gang Yin, Tao Wang: Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? Inf. Softw. Technol. 74: 204-218 (2016) Vaibhavi Kalgutkar, Ratinder Kaur, Hugo Gonzalez, Natalia Stakhanova, Alina Matyukhina: Code Authorship Attribution: Methods and Challenges. ACM Comput. Surv. 52(1): 3:1-3:36 (2019) Bixin Li, Xiaobing Sun, Hareton Leung, Sai Zhang: A survey of code-based change impact analysis techniques. Softw. Test. Verification Reliab. 23(8): 613-646 (2013) Find the approach with the best recommendation performance A neighboring area related to plagiarism and malware detection Useful for tracking ownership along dependencies!?
  • 25. 25 • Developer workflow analysis • Ownership management • Code review automation Comprehension challenges in ...
  • 27. 27 Code review -- all great? The most frequent reasons for confusion are the missing rationale, discussion of non-functional requirements of the solution, and lack of familiarity with existing code. We observe that tools (code review, issue tracker, and version control) and communication issues, such as disagreement or ambiguity in communicative intentions, may also cause confusion during code reviews. Source: Felipe Ebert, Fernando Castor, Nicole Novielli, Alexander Serebrenik: Confusion in Code Reviews: Reasons, Impacts, and Coping Strategies.  SANER 2019: 49-60
  • 28. 28 Background on code review (automation) I/II Mika Mäntylä, Casper Lassenius: What Types of Defects Are Really Discovered in Code Reviews?  IEEE Trans. Software Eng. 35(3): 430-448 (2009) Devarshi Singh, Varun Ramachandra Sekar, Kathryn T. Stolee, Brittany Johnson: Evaluating how static analysis tools can reduce code review effort.  VL/HCC 2017: 101-105 Laura MacLeod, Michaela Greiler, Margaret-Anne D. Storey, Christian Bird, Jacek Czerwonka: Code Reviewing in the Trenches: Challenges and Best Practices.  IEEE Softw. 35(4): 34-42 (2018) Jacek Czerwonka, Michaela Greiler, Christian Bird, Lucas Panjer, Terry Coatta: CodeFlow: Improving the Code Review Process at Microsoft.  ACM Queue 16(5): 20 (2018) Finding relevant documentation about changes was another frequently reported challenge: 'what it’s doing and how it’s integrated with everything else.' Functional versus evolvability defetcs PMD preempts 16% comments. Another 17% could be implemented. Open fundamental factor: how to avoid code changes with unrelated concerns
  • 29. 29 Background on code review (automation) II/II Davide Spadini, Fabio Palomba, Tobias Baum, Stefan Hanenberg, Magiel Bruntink, Alberto Bacchelli: Test-driven code review: an empirical study. ICSE 2019: 1061-1072 Eliane Stampfer Wiese, Anna N. Rafferty, Daniel M. Kopta, Jacqulyn M. Anderson: Replicating novices' struggles with coding style. ICPC 2019: 13-18 Fabiano Pecorelli, Fabio Palomba, Dario Di Nucci, Andrea De Lucia: Comparing heuristic and machine learning approaches for metric-based code smell detection. ICPC 2019: 93-104 Evolvability defects may be missed when using test-driven code review Good style is somewhat subjective; sometimes arbitrary. It's difficult to delegate smell detection to ML.
  • 30. 30 • https://www.phacility.com/phabricator/ • https://www.jetbrains.com/upsource/ • https://aws.amazon.com/codeguru/ • https://www.codacy.com/ • ... Related products
  • 31. 31 • Signal selection • Code fixes • Comments • Commit summaries • Test plans • Review decisions • Review comments Some automation themes in code review automation Let's look at a few paper representatives.
  • 32. 32 Signal selection (Code review automation) Mateusz Machalica, Alex Samylkin, Meredith Porth, Satish Chandra: Predictive test selection.  ICSE (SEIP) 2019: 91-100 Reduce infrastructural costs for testing without missing (much) faulty changes
  • 33. 33 Code fixes (Code review automation) Johannes Bader, Andrew Scott, Michael Pradel, Satish Chandra: Getafix: learning to fix bugs automatically. Proc. ACM Program. Lang. 3(OOPSLA): 159:1-159:27 (2019) Saikat Chakraborty, Miltiadis Allamanis, Baishakhi Ray : CODIT: Code Editing with Tree-Based Neural Machine Translation. https://arxiv.org/abs/1810.00314 (2019) Hussein Alrubaye, Mohamed Wiem Mkaouer, Ali Ouni: On the use of information retrieval to automate the detection of third-party Java library migration at the method level.  ICPC 2019: 347-357 Tree differencing, anti- unification and hierarchical clustering Neural networks instead. Method mapping recovery based on IR appraoch
  • 34. 34 Comments (Code review automation) Xing Hu, Ge Li, Xin Xia, David Lo, Zhi Jin: Deep code comment generation with hybrid lexical and syntactical information.  Empirical Software Engineering 25(3): 2179-2217 (2020) and (by the same authors): Deep code comment generation.  ICPC 2018: 200-210 Zhai, Juan, Xu, Xiangzhe, Shi, Yu, Tao, Guanhong, Pan, Minxue, Ma, Shiqing, Xu, Lei, Zhang, Weifeng, Tan, Lin Zhang, Xiangyu. (2020). CPC: automatically classifying and propagating natural language comments via program analysis. ICSE 2020. AST to sequences and followed by neural machine translation Scenarios: (i) Generate missing comments (ii) Use comments as assertions
  • 35. 35 Commit summaries (Code review automation) Jingjing Liang, Yaozong Hou, Shurui Zhou, Junjie Chen, Yingfei Xiong, Gang Huang: How to Explain a Patch: An Empirical Study of Patch Explanations in Open Source Projects. ISSRE 2019: 58-69 Shurui Zhou, Stefan Stanciulescu, Olaf Leßenich, Yingfei Xiong, Andrzej Wasowski, Christian Kästner: Identifying features in forks. ICSE 2018: 105-116 and see also: Luyao Ren, Shurui Zhou, Christian Kästner: Forks insight: providing an overview of GitHub forks. ICSE (Companion Volume) 2018: 179-180 To generate a patch explanation, it is important to first understand how patches were explained. Fork summaries: compute a multi- dimensional dependency graph from the changed code (integrating def-use, control flow, and adjacency), clusters of changed syntax nodes are computed from the graph by community detection -- the resulting clusters are labelled by TF-IDF and friends.
  • 36. 36 Review decisions (Code review automation) Shu-Ting Shi, Ming Li, David Lo, Ferdian Thung, Xuan Huo: Automatic Code Review by Learning the Revision of Source Code.  AAAI 2019: 4910-4917 Deep learning-based approach which takes into account context for changes
  • 37. 37 Review comments (Code review automation) Jing Kai Siow, Cuiyun Gao, Lingling Fan, Sen Chen, Yang Liu: CORE: Automating Review Recommendation for Code Changes. SANER 2020: 284-295 Anshul Gupta, Neel Sundaresan: Intelligent code reviews using deep learning. KDD’18 Deep Learning Day, August 2018, London, UK Deep learning / embedding for suggesting review comments for changes NB: There is notable difference between review comment suggestion versus linting based on learned rules! That is, ...
  • 38. 38 • How to make nitpicking obsolete? • How to assess the reliability of a review? • What is a good commit summary? • What additional info to provide? • What is anomalous code? Challenges in code review automation
  • 39. 39 Diff • Diff summary • Diff test plan • Commit • CI signal Miscellaneous • Task (bug or feature) • Alert • Incident • Root causing diff Entities involved in code review (at Facebook) How to improve code review automation? Hypothesis -- It needs a combination of these: • Knowledge graph • Change impact analysis • Traceability recovery • Summaries
  • 40. 40 • Developer workflow analysis, • ownership management, and • code review automation? Any comprehension challenges other than in ... Of course: • Provenance (privacy) • Dependencies (reliability) • ...