SlideShare a Scribd company logo
1 of 64
Download to read offline
Intelligent Software Engineering:
Synergy between AI and Software
Engineering
Tao Xie
University of Illinois at Urbana-Champaign
taoxie@illinois.edu
http://taoxie.cs.illinois.edu/
Innovations in Software Engineering Conference (ISEC 2018)
Feb 9-11 2018, Hyderabad, India
Artificial Intelligence  Software Engineering
Artificial
Intelligence
Software
Engineering
Intelligent Software Engineering
Intelligent Software Engineering
1st International Workshop on
Intelligent Software Engineering (WISE 2017)
Tao Xie
University of Illinois at
Urbana-Champaign, USA
Abhik Roychoudhury
National University of
Singapore, Singapore
Organizing Committee
Wolfram Schulte
Facebook, USA
Qianxiang Wang
Huawei, China
Sponsor:
Co-Located with ASE 2017
https://isofteng.github.io/wise2017/
Workshop Program
8 invited speakers
1 panel discussion
https://isofteng.github.io/wise2017/
International Workshop on Intelligent Software Engineering (WISE 2017)
Artificial Intelligence  Software Engineering
Artificial
Intelligence
Software
Engineering
Intelligent Software Engineering
Intelligent Software Engineering
Past: Automated Software Testing
• 10 years of collaboration with Microsoft Research on Pex
• .NET Test Generation Tool based on Dynamic Symbolic Execution
• Example Challenges
• Path explosion [DSN’09: Fitnex]
• Method sequence explosion [OOPSLA’11: Seeker]
• Shipped in Visual Studio 2015/2017 Enterprise Edition
• As IntelliTest
• Code Hunt [ICSE’15 JSEET] w/ > 6 million (6,114,978) users after 3.5 years
• Including registered users playing on www.codehunt.com, anonymous users and
accounts that access http://api.codehunt.com/ directly via the documented REST
APIs) https://www.codehunt.com/
http://taoxie.cs.illinois.edu/publications/ase14-pexexperiences.pdf
Past: Android App Testing
• 2 years of collaboration with Tencent Inc. WeChat testing team
• Guided Random Test Generation Tool improved over Google Monkey
• Resulting tool deployed in daily WeChat testing practice
• WeChat = WhatsApp + Facebook + Instagram + PayPal + Uber …
• #monthly active users: 963 millions @2017 2ndQ
• Daily#: dozens of billion messages sent, hundreds of million photos uploaded,
hundreds of million payment transactions executed
• First studies on testing industrial Android apps
[FSE’16IN][ICSE’17SEIP]
• Beyond open source Android apps
focused by academia
WeChat
http://taoxie.cs.illinois.edu/publications/esecfse17industry-replay.pdf
http://taoxie.cs.illinois.edu/publications/fse16industry-wechat.pdf
Next: Intelligent Software Testing(?)
• Learning from others working on the same things
• Our work on mining API usage method sequences to test the API
[ESEC/FSE’09: MSeqGen]
• Visser et al. Green: Reducing, reusing and recycling constraints in program
analysis. FSE’12.
• Learning from others working on similar things
• Jia et al. Enhancing reuse of constraint solutions to improve symbolic execution.
ISSTA’15.
• Aquino et al. Heuristically Matching Solution Spaces of Arithmetic Formulas to
Efficiently Reuse Solutions. ICSE’17.
[Jia et al. ISSTA’15]
Mining and Understanding Software Enclaves (MUSE)
http://materials.dagstuhl.de/files/15/15472/15472.SureshJagannathan1.Slides.pdf
DARPA
Pliny: Mining Big
Code to help
programmers
(Rice U., UT Austin,
Wisconsin, Grammatech)
http://pliny.rice.edu/ http://news.rice.edu/2014/11/05/next-for-darpa-autocomplete-for-programmers-2/
$11 million (4 years)
Program Synthesis: NSF Expeditions in Computing
https://excape.cis.upenn.edu/https://www.sciencedaily.com/releases/2016/08/160815134941.htm
10 millions (5 years)
Software related data are pervasive
Runtime traces
Program logs
System events
Perf counters
…
Usage log
User surveys
Online forum posts
Blog & Twitter
…
Source code
Bug history
Check-in history
Test cases
Keystrokes
…
In Collaboration with Microsoft Research Asia
Software analytics is to enable software practitioners to
perform data exploration and analysis in order to obtain
insightful and actionable information for data-driven tasks
around software and services.
http://taoxie.cs.illinois.edu/publications/malets11-analytics.pdf
Software Analytics
Past: Software Analytics
• StackMine [ICSE’12, IEEESoft’13]: performance debugging in the large
• Data Source: Performance call stack traces from Windows end users
• Analytics Output: Ranked clusters of call stack traces based on shared patterns
• Impact: Deployed/used in daily practice of Windows Performance Analysis team
• XIAO [ACSAC’12, ICSE’17 SEIP]: code-clone detection and search
• Data Source: Source code repos (+ given code segment optionally)
• Analytics Output: Code clones
• Impact: Shipped in Visual Studio 2012; deployed/used in daily practice of
Microsoft Security Response Center
In Collaboration with Microsoft Research Asia
Internet
Past: Software Analytics
• Service Analysis Studio [ASE’13-EX]: service incident management
• Data Source: Transaction logs, system metrics, past incident reports
• Analytics Output: Healing suggestions/likely root causes of the given incident
• Impact: Deployed and used by an important Microsoft service (hundreds of
millions of users) for incident management
In Collaboration with Microsoft Research Asia
Next: Intelligent Software Analytics(?)
Microsoft Research Asia - Software Analytics Group - Smart Data Discovery
IN4: INteractive, Intuitive, Instant, INsights
Quick Insights -> Microsoft Power BI
Gartner Magic Quadrant for Business
Intelligence & Analytics Platforms
Microsoft Research Asia - Software Analytics Group
https://www.hksilicon.com/articles/1213020
18
Existing Approaches on NL  Regular Expressions
[Ranta 1998], [Kushman and Barzilay 2013], [Locascio et al. 2016]
Used only synthetic data for training and testing
Are these approaches effective to address
real-world situations ?
Deep Learning for NLRegex: Get Real!
Zhong et al. Generating Regular Expressions from Natural Language Specifications: Are We There Yet? In AAAI 2018
Workshop on NLP for Software Engineering (NL4SE 2018)
http://taoxie.cs.illinois.edu/publications/nl4se18-regex.pdf
Synthetic datasets
KB13 [Kushman and Barzilay 2013] (824 pairs)
 Write NL sentences to capture the examples strings
NL-RX [Locascio et al., 2016] (10,000 pairs)
 Parse a regex and generate initial NL sentences based on a predefined grammar
 Paraphrase the generated sentences
Real-world dataset
RegexLib (3,619 pairs)
 From regexlib.com
19
Characteristic Study
20
Complexity of regular expressions
• Synthetic dataset support only a subset of regex language:
e.g., ‘?’ ∈ RegexLib, but ∉ NL-RX or KB-13
Length statistics of regular expressions
•# of distinct words: 13,491 (RegexLib) vs 715 (KB13)
vs 560(NL-RX)
21
Complexity of NL sentences
#words statistics of NL sentences
Deep-Regex [Locascio et al. 2016]
Regular expression generation  Machine translation
22
Experimental Study
Sequence-to-sequence learning
https://github.com/nicholaslocascio/deep-regexhttps://aclweb.org/anthology/D/D16/D16-1197.pdf
String-Equal: exact-matching
DFA-Equal: semantically matching
23
Effectiveness on Synthetic Datasets
DFA: Deterministic Finite Automaton
Experiment settings
Use Deep-Regex to train a model using synthetic NL-RX dataset
Build a testing set (1,091 pairs) from RegexLib
Eliminate long NL sentences
Results
Without beam search: cannot generate any correct regex
Beam search (size: 20): generate correct regexs for 5 NL (0.46%)
Huge Drop of Top-20 accuracy! (90.9%  0.46%)
24
Experiments on Real-world Dataset
Variations of NL sentences
 NL-RX: NL sentences are generated from a predefined grammar
 Augmenting training data may alleviate the error
Numerical range
25
New Causes of Errors on Real-world Dataset
Description Ground Truth Predicted Result
Match the numbers 100 to 199. 1[0-9][0-9] ([0-9])*
RegexLib is too sparse to be a sufficient training set
Collect sufficient labeled real-world data
Synthesize data to supplement the collected real-world data
26
Ongoing Work: Large Real-world Benchmark
Dataset # Pairs # distinct words
NL-RX 10,000 560
RegexLib 3619 13,491
String test cases can handle the ambiguity of NL sentences
String test cases can differentiate regular expression candidates
help select the best candidate during beam search
27
Description Ground Truth Predicted Result
Items with a small letter preceding “dog”,
at least thrice
([a-b].*dog.*){3,} ([a-b]).*((dog){3,})
Test case:“adogadogadog”
Ongoing Work: Testability of Regular Expressions
https://medium.com/ai-for-software-engineering/ai-for-software-engineering-industry-landscape-d8c7c7f82ba
29
AI for SE Startups Rooted from Research
http://www.diffblue.com/
Oxford University spin-off, Daniel Kroening et al.
Peking University spin-off, Ge Li et al.
https://www.codota.com/
Technion spin-off, Eran Yahav et al.
Technical University Munich spin-off, Benedikt Hauptmann et al.
https://www.qualicen.de/en/
http://aixcoder.com/
Open Topics in Intelligent Software Engineering (ISE)
• How to determine whether a software engineering tool is indeed
“intelligent”?
• Turing test for such tool?
• What sub-areas/problems in ISE shall the research community invest
efforts on as high priority?
• How to turn ISE research results into industrial/open source practice?
• …
Artificial Intelligence  Software Engineering
Artificial
Intelligence
Software
Engineering
Intelligent Software Engineering
Intelligent Software Engineering
White-House-Sponsored Workshop (2016 June 28)
http://www.cmu.edu/safartint/
Self-Driving Tesla Involved in Fatal Crash (2016 June 30)
http://www.nytimes.com/2016/07/01/business/self-driving-tesla-fatal-crash-investigation.html
“A Tesla car in autopilot crashed into a trailer
because the autopilot system failed to recognize
the trailer as an obstacle due to its “white color
against a brightly lit sky” and the “high ride
height”
http://www.cs.columbia.edu/~suman/docs/deepxplore.pdf
Microsoft's Teen Chatbot Tay
Turned into Genocidal Racist (2016 March 23/24)
http://www.businessinsider.com/ai-expert-explains-why-microsofts-tay-chatbot-is-so-racist-2016-3
"There are a number of precautionary
steps they [Microsoft] could have taken.
It wouldn't have been too hard to create
a blacklist of terms; or narrow the scope
of replies. They could also have simply
manually moderated Tay for the first few
days, even if that had meant slower
responses."
“businesses and other AI developers will
need to give more thought to the
protocols they design for testing and
training AIs like Tay.”
NSF New Program: Formal Methods in the Field
• Anticipated Funding: $8 millions; #awards: 8
• Deadline: May 8th 2018
Machine Learning: The sheer complexity of machine learning algorithms
and their applications makes it hard to ensure correctness. Exploration of
new formal methods can be used to characterize boundaries of behavior,
and may bring much needed rigor to machine learning algorithms and
applications. These techniques could range from novel programming
languages and compilers for more robust machine learning to formal
verification techniques for machine learning systems that could provide
assurances of safety, correctness, and fairness. The interplay between
program synthesis and machine learning offers many interesting
possibilities to both improve machine learning and formal techniques.
https://www.nsf.gov/pubs/2018/nsf18536/nsf18536.htm
Problems in Testing ML Software
● ML Software suffers from the “no oracle problem”
○ Previous approach @Columbia U. on metamorphic testing:
check satisfaction of a property with different inputs in
equivalent classes
https://medium.com/trustableai/testing-ai-with-metamorphic-testing-61d690001f5c
● Inaccuracy may be desirable to avoid the overfitting problem
● Auto-generated test inputs have no expected outputs
36
Multiple-Implementation Testing
37
http://taoxie.cs.illinois.edu/publications/edsmls18-mitest.pdf
Srisakaokul et al. Multiple-Implementation Testing of Supervised Learning Software. InAAAI-18
Workshop on Engineering Dependable and Secure Machine Learning Systems (EDSMLS 2018)
Evaluation Setup
● kNN:
○ 19 implementations (including Weka, RapidMiner, and KNIME)
○ Parameters: k = 1, Euclidean-distance metric
○ 3 data sets: Iris, Breast Cancer Wisconsin (BCW), Glass Identification
(Glass)
● Naive Bayes (NB):
○ 7 implementations (including Weka, RapidMiner, and KNIME)
○ Parameters: none
○ 3 data sets: Breast Cancer Wisconsin (BCW), Haberman’s Survival Data
(Haberman), Hayes-Roth (Hayes)
● Randomly split each data set into training and test set with the ratio of 4:1
● The data sets contain about 1000 instances in total
38
Effectiveness of Majority Oracle
Overall, 20.5% of the tests are deviating tests, and 97.5% of the
deviating tests reveal faults
39
Algorithm
Major-Oracle
Deviating Tests
(%)
Fault Revealing
Tests (%)
#Faults
kNN 23.84% 100.00% 13
NB 16.29% 94.31% 16
kNN+NB 20.50% 97.50% 29
Effectiveness of Majority Oracle (cont.)
40
Effectiveness of Majority Oracle (cont.)
41
Fault Example 1 (in kNN)
● Returns NaN
● !(max==min) should be !(maxValue==minValue)
42
Effectiveness of Majority Oracle (cont.)
43
Fault Example 2 (in kNN)
● When k = 1, the method returns the first element without sorting
44
Other’s Work at Columbia/Lehigh U.:
SOSP 2017 Best Paper Award
http://www.cs.columbia.edu/~suman/docs/deepxplore.pdf
https://github.com/peikexin9/deepxplore
Other’s Work at Columbia U./UVa: ICSE 2018
https://arxiv.org/pdf/1708.08559.pdf
Our Most Recent Work:
“Testing” a Classifier (aka Adversarial Machine Learning)
Malware Detection in Adversarial Settings:
Exploiting Feature Evolutions and Confusions in Android Apps
WeiYang, Deguang Kong,Tao Xie and Carl A. Gunter
Annual Computer Security Applications Conference (ACSAC 2017)
http://taoxie.cs.illinois.edu/publications/acsac17-malware.pdf
Evasion attack on classifiers
• Goals: Understand classifier robustness;
Generate testing samples to help build better classifiers.
• Example:
4848
Generating
adversarial
example
helps build
better
classifiers
49
Figure Credit: GoodFellow 2016
Three practical constraints to craft a realistic
attack against mobile malware classifiers
• Preserving Malicious Behaviors.
• Maintaining the Robustness of Apps.
• Evading Malware Detectors.
50
Malware Recomposition Variation (MRV)
• Malware Evolution Attack
• Malware Confusion Attack
• Insight
• Follow existing patterns!
• In our mutation strategies, the feature
patterns are extracted from existing malware
evolution histories and existing evasive
malware.
51
Figure Credit: Trend Micro
Figure Credit: Malware News
Why MRV works
• Large feature set has numerous non-informative or even misleading
features.
• Insight 1: Malware detectors often confuse non-essential features in code
clones as discriminative features.
• Insight 2: Using a universal set of features for all malware families would
result in a large number of non-essential features to characterize each
family.
52
Feature Model
• A substitute model
• Resource Temporal Locale
Dependency model
• Summarize the essential features
and contextual features commonly
used in malware detection
• Transferability property
53
Target
Model
Substitute
Model
Adversarial
Samples
Labeled
Data
TrainClassify
Adversarial
craftingAttack
Approach
• Mutation strategy synthesis:
• Phylogenetic analysis for evolution
attack
• Similarity metric for confusion attack
• Program mutation
• Program transplantation/refactoring
54
Practicability of attacks
• Check the preserving of malicious behaviors
• Our impact analysis is based on the insight that the component-based nature
of Android constrains the impact of mutations within certain components
• Check the robustness of mutated apps
• Each mutated app was tested against 5,000 events randomly generated by
Monkey to ensure that the app does not crash
55
Evaluation
• Malware detection techniques:
• AppContext, a malware detector leveraging semantic features extracted from
call graphs and control-flow graphs.
• Drebin, a malware detector leveraging eight categories of features that reside
either in the manifest file or in the disassembled code.
• Subjects: 1,917 malware and 1,935 benign apps
• Baseline:
• OCTOPUS, a syntactic app obfuscation tool similar to DroidChameleon.
• Random MRV
56
Results - Defeating existing malware detection
57
• ORI: Original test dataset (ORI)
• MRV: Test dataset with
adversarial samples.
Results – Comparing with Baselines
58
• MRV produces much more evasive variants than both OCTOPUS and Random
MRV for all three tools, especially the learning-based tools
Results – Comparing with Baselines
59
• Random MRV generates more than 320,000 variants, but only 212 of them
can run without crashing (and only 2 can evade detection of AppContext).
Strengthening the robustness of detection
• Adversarial Training
• We randomly chose half of our generated malware variants into the training
set to train the model
• Variant Detector
• We create a new classifier called variant detector to detect whether an app
is a variants derived from existing malware.
• Weight Bounding
• We constrain the weight on a few dominant features to make feature
weights more evenly distributed.
60
Results: strengthening the robustness of detection
6161
Artificial Intelligence  Software Engineering
Artificial
Intelligence
Software
Engineering
Intelligent Software Engineering
Intelligent Software Engineering
63
Thank You!
Q & A
This work was supported in part by NSF under grants no. CCF-1409423, CNS-1434582, CNS-1513939, CNS-1564274.
Artificial Intelligence  Software Engineering
Artificial
Intelligence
Software
Engineering
Intelligent Software Engineering
Intelligent Software Engineering

More Related Content

What's hot

Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringTao Xie
 
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago
 
Towards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that MattersTowards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that MattersTao Xie
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Publish or Perish: Questioning the Impact of Our Research on the Software Dev...
Publish or Perish: Questioning the Impact of Our Research on the Software Dev...Publish or Perish: Questioning the Impact of Our Research on the Software Dev...
Publish or Perish: Questioning the Impact of Our Research on the Software Dev...Margaret-Anne Storey
 
Pathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and ChallengesPathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and ChallengesTao Xie
 
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...Sri Ambati
 
Predicting More from Less: Synergies of Learning
Predicting More from Less: Synergies of LearningPredicting More from Less: Synergies of Learning
Predicting More from Less: Synergies of LearningCS, NcState
 
Why is TDD so hard for Data Engineering and Analytics Projects?
Why is TDD so hard for Data Engineering and Analytics Projects?Why is TDD so hard for Data Engineering and Analytics Projects?
Why is TDD so hard for Data Engineering and Analytics Projects?Phil Watt
 
Why is Test Driven Development for Analytics or Data Projects so Hard?
Why is Test Driven Development for Analytics or Data Projects so Hard?Why is Test Driven Development for Analytics or Data Projects so Hard?
Why is Test Driven Development for Analytics or Data Projects so Hard?Phil Watt
 
Software Engineering for ML/AI, keynote at FAS*/ICAC/SASO 2019
Software Engineering for ML/AI, keynote at FAS*/ICAC/SASO 2019Software Engineering for ML/AI, keynote at FAS*/ICAC/SASO 2019
Software Engineering for ML/AI, keynote at FAS*/ICAC/SASO 2019Patrizio Pelliccione
 
Mindtrek 2015 - Tampere Finland
Mindtrek 2015 - Tampere Finland Mindtrek 2015 - Tampere Finland
Mindtrek 2015 - Tampere Finland Panos Fitsilis
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairClaire Le Goues
 
PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talkRay Buse
 
Implications of Open Source Software Use (or Let's Talk Open Source)
Implications of Open Source Software Use (or Let's Talk Open Source)Implications of Open Source Software Use (or Let's Talk Open Source)
Implications of Open Source Software Use (or Let's Talk Open Source)Gail Murphy
 
Opinion Mining for Software Engineering
Opinion Mining for Software EngineeringOpinion Mining for Software Engineering
Opinion Mining for Software EngineeringAlexander Serebrenik
 
Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Massimiliano Di Penta
 
Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...
Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...
Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...Joeran Beel
 

What's hot (20)

Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...
 
Towards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that MattersTowards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that Matters
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Software bug prediction
Software bug prediction Software bug prediction
Software bug prediction
 
Publish or Perish: Questioning the Impact of Our Research on the Software Dev...
Publish or Perish: Questioning the Impact of Our Research on the Software Dev...Publish or Perish: Questioning the Impact of Our Research on the Software Dev...
Publish or Perish: Questioning the Impact of Our Research on the Software Dev...
 
Se research update
Se research updateSe research update
Se research update
 
Pathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and ChallengesPathways to Technology Transfer and Adoption: Achievements and Challenges
Pathways to Technology Transfer and Adoption: Achievements and Challenges
 
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
 
Predicting More from Less: Synergies of Learning
Predicting More from Less: Synergies of LearningPredicting More from Less: Synergies of Learning
Predicting More from Less: Synergies of Learning
 
Why is TDD so hard for Data Engineering and Analytics Projects?
Why is TDD so hard for Data Engineering and Analytics Projects?Why is TDD so hard for Data Engineering and Analytics Projects?
Why is TDD so hard for Data Engineering and Analytics Projects?
 
Why is Test Driven Development for Analytics or Data Projects so Hard?
Why is Test Driven Development for Analytics or Data Projects so Hard?Why is Test Driven Development for Analytics or Data Projects so Hard?
Why is Test Driven Development for Analytics or Data Projects so Hard?
 
Software Engineering for ML/AI, keynote at FAS*/ICAC/SASO 2019
Software Engineering for ML/AI, keynote at FAS*/ICAC/SASO 2019Software Engineering for ML/AI, keynote at FAS*/ICAC/SASO 2019
Software Engineering for ML/AI, keynote at FAS*/ICAC/SASO 2019
 
Mindtrek 2015 - Tampere Finland
Mindtrek 2015 - Tampere Finland Mindtrek 2015 - Tampere Finland
Mindtrek 2015 - Tampere Finland
 
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program RepairIt Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
It Does What You Say, Not What You Mean: Lessons From A Decade of Program Repair
 
PhD Proposal talk
PhD Proposal talkPhD Proposal talk
PhD Proposal talk
 
Implications of Open Source Software Use (or Let's Talk Open Source)
Implications of Open Source Software Use (or Let's Talk Open Source)Implications of Open Source Software Use (or Let's Talk Open Source)
Implications of Open Source Software Use (or Let's Talk Open Source)
 
Opinion Mining for Software Engineering
Opinion Mining for Software EngineeringOpinion Mining for Software Engineering
Opinion Mining for Software Engineering
 
Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?Empirical evaluation in 2020: how big, how beautiful?
Empirical evaluation in 2020: how big, how beautiful?
 
Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...
Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...
Real-World Recommender Systems for Academia: The Pain and Gain in Building, O...
 

Similar to ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Software Engineering

Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringTao Xie
 
The Road to Data-Informed Agile Development Processes
The Road to Data-Informed Agile Development ProcessesThe Road to Data-Informed Agile Development Processes
The Road to Data-Informed Agile Development ProcessesChristoph Matthies
 
PriyankaDighe_Resume_new
PriyankaDighe_Resume_newPriyankaDighe_Resume_new
PriyankaDighe_Resume_newPriyanka Dighe
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
Official resume titash_mandal_
Official resume titash_mandal_Official resume titash_mandal_
Official resume titash_mandal_Titash Mandal
 
Advancing Foundation and Practice of Software Analytics
Advancing Foundation and Practice of Software AnalyticsAdvancing Foundation and Practice of Software Analytics
Advancing Foundation and Practice of Software AnalyticsTao Xie
 
AnupDudaniDataScience2015
AnupDudaniDataScience2015AnupDudaniDataScience2015
AnupDudaniDataScience2015Anup Dudani
 
Using R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataUsing R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataIJCSIS Research Publications
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Human computer interaction research at ibm t
Human computer interaction research at ibm tHuman computer interaction research at ibm t
Human computer interaction research at ibm tJohn Thomas
 
Big(ger) Data in Software Engineering
Big(ger) Data in Software EngineeringBig(ger) Data in Software Engineering
Big(ger) Data in Software EngineeringMehdi Mirakhorli
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4Ferdin Joe John Joseph PhD
 
Graphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present FutureGraphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present Futurejexp
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 

Similar to ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Software Engineering (20)

Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
 
The Road to Data-Informed Agile Development Processes
The Road to Data-Informed Agile Development ProcessesThe Road to Data-Informed Agile Development Processes
The Road to Data-Informed Agile Development Processes
 
PriyankaDighe_Resume_new
PriyankaDighe_Resume_newPriyankaDighe_Resume_new
PriyankaDighe_Resume_new
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Official resume titash_mandal_
Official resume titash_mandal_Official resume titash_mandal_
Official resume titash_mandal_
 
Srinivasan Rajappa
Srinivasan RajappaSrinivasan Rajappa
Srinivasan Rajappa
 
Advancing Foundation and Practice of Software Analytics
Advancing Foundation and Practice of Software AnalyticsAdvancing Foundation and Practice of Software Analytics
Advancing Foundation and Practice of Software Analytics
 
Purvesh-Karkamkar
Purvesh-KarkamkarPurvesh-Karkamkar
Purvesh-Karkamkar
 
AnupDudaniDataScience2015
AnupDudaniDataScience2015AnupDudaniDataScience2015
AnupDudaniDataScience2015
 
Using R for Classification of Large Social Network Data
Using R for Classification of Large Social Network DataUsing R for Classification of Large Social Network Data
Using R for Classification of Large Social Network Data
 
Marie_Zhang_Resume_v4
Marie_Zhang_Resume_v4Marie_Zhang_Resume_v4
Marie_Zhang_Resume_v4
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Human computer interaction research at ibm t
Human computer interaction research at ibm tHuman computer interaction research at ibm t
Human computer interaction research at ibm t
 
Big(ger) Data in Software Engineering
Big(ger) Data in Software EngineeringBig(ger) Data in Software Engineering
Big(ger) Data in Software Engineering
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
 
Graphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present FutureGraphs & Neo4j - Past Present Future
Graphs & Neo4j - Past Present Future
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Paper presentation
Paper presentationPaper presentation
Paper presentation
 

More from Tao Xie

MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...Tao Xie
 
Diversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from AlliesDiversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from AlliesTao Xie
 
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Tao Xie
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTao Xie
 
Advances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and PracticeAdvances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and PracticeTao Xie
 
Common Technical Writing Issues
Common Technical Writing IssuesCommon Technical Writing Issues
Common Technical Writing IssuesTao Xie
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckTao Xie
 
Transferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to PracticeTransferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to PracticeTao Xie
 
User Expectations in Mobile App Security
User Expectations in Mobile App SecurityUser Expectations in Mobile App Security
User Expectations in Mobile App SecurityTao Xie
 
Impact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering ToolingImpact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering ToolingTao Xie
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software DatasetsTao Xie
 
Next Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized TestingNext Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized TestingTao Xie
 
Csise15 codehunt
Csise15 codehuntCsise15 codehunt
Csise15 codehuntTao Xie
 
Text Analytics for Security
Text Analytics for SecurityText Analytics for Security
Text Analytics for SecurityTao Xie
 
Gamifying Teaching and Learning of Software Engineering and Programming
Gamifying Teaching and Learning of Software Engineering and ProgrammingGamifying Teaching and Learning of Software Engineering and Programming
Gamifying Teaching and Learning of Software Engineering and ProgrammingTao Xie
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTao Xie
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)Tao Xie
 
Teaching and Learning Programming and Software Engineering via Interactive Ga...
Teaching and Learning Programming and Software Engineering via Interactive Ga...Teaching and Learning Programming and Software Engineering via Interactive Ga...
Teaching and Learning Programming and Software Engineering via Interactive Ga...Tao Xie
 

More from Tao Xie (19)

MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
 
Diversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from AlliesDiversity and Computing/Engineering: Perspectives from Allies
Diversity and Computing/Engineering: Perspectives from Allies
 
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)Transferring Software Testing Tools to Practice (AST 2017 Keynote)
Transferring Software Testing Tools to Practice (AST 2017 Keynote)
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 
Advances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and PracticeAdvances in Unit Testing: Theory and Practice
Advances in Unit Testing: Theory and Practice
 
Common Technical Writing Issues
Common Technical Writing IssuesCommon Technical Writing Issues
Common Technical Writing Issues
 
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William EnckHotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
HotSoS16 Tutorial "Text Analytics for Security" by Tao Xie and William Enck
 
Transferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to PracticeTransferring Software Testing and Analytics Tools to Practice
Transferring Software Testing and Analytics Tools to Practice
 
User Expectations in Mobile App Security
User Expectations in Mobile App SecurityUser Expectations in Mobile App Security
User Expectations in Mobile App Security
 
Impact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering ToolingImpact-Driven Research on Software Engineering Tooling
Impact-Driven Research on Software Engineering Tooling
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Software Mining and Software Datasets
Software Mining and Software DatasetsSoftware Mining and Software Datasets
Software Mining and Software Datasets
 
Next Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized TestingNext Generation Developer Testing: Parameterized Testing
Next Generation Developer Testing: Parameterized Testing
 
Csise15 codehunt
Csise15 codehuntCsise15 codehunt
Csise15 codehunt
 
Text Analytics for Security
Text Analytics for SecurityText Analytics for Security
Text Analytics for Security
 
Gamifying Teaching and Learning of Software Engineering and Programming
Gamifying Teaching and Learning of Software Engineering and ProgrammingGamifying Teaching and Learning of Software Engineering and Programming
Gamifying Teaching and Learning of Software Engineering and Programming
 
Tutorial: Text Analytics for Security
Tutorial: Text Analytics for SecurityTutorial: Text Analytics for Security
Tutorial: Text Analytics for Security
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Teaching and Learning Programming and Software Engineering via Interactive Ga...
Teaching and Learning Programming and Software Engineering via Interactive Ga...Teaching and Learning Programming and Software Engineering via Interactive Ga...
Teaching and Learning Programming and Software Engineering via Interactive Ga...
 

Recently uploaded

Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Software Engineering

  • 1. Intelligent Software Engineering: Synergy between AI and Software Engineering Tao Xie University of Illinois at Urbana-Champaign taoxie@illinois.edu http://taoxie.cs.illinois.edu/ Innovations in Software Engineering Conference (ISEC 2018) Feb 9-11 2018, Hyderabad, India
  • 2. Artificial Intelligence  Software Engineering Artificial Intelligence Software Engineering Intelligent Software Engineering Intelligent Software Engineering
  • 3. 1st International Workshop on Intelligent Software Engineering (WISE 2017) Tao Xie University of Illinois at Urbana-Champaign, USA Abhik Roychoudhury National University of Singapore, Singapore Organizing Committee Wolfram Schulte Facebook, USA Qianxiang Wang Huawei, China Sponsor: Co-Located with ASE 2017 https://isofteng.github.io/wise2017/
  • 4. Workshop Program 8 invited speakers 1 panel discussion https://isofteng.github.io/wise2017/ International Workshop on Intelligent Software Engineering (WISE 2017)
  • 5. Artificial Intelligence  Software Engineering Artificial Intelligence Software Engineering Intelligent Software Engineering Intelligent Software Engineering
  • 6. Past: Automated Software Testing • 10 years of collaboration with Microsoft Research on Pex • .NET Test Generation Tool based on Dynamic Symbolic Execution • Example Challenges • Path explosion [DSN’09: Fitnex] • Method sequence explosion [OOPSLA’11: Seeker] • Shipped in Visual Studio 2015/2017 Enterprise Edition • As IntelliTest • Code Hunt [ICSE’15 JSEET] w/ > 6 million (6,114,978) users after 3.5 years • Including registered users playing on www.codehunt.com, anonymous users and accounts that access http://api.codehunt.com/ directly via the documented REST APIs) https://www.codehunt.com/ http://taoxie.cs.illinois.edu/publications/ase14-pexexperiences.pdf
  • 7. Past: Android App Testing • 2 years of collaboration with Tencent Inc. WeChat testing team • Guided Random Test Generation Tool improved over Google Monkey • Resulting tool deployed in daily WeChat testing practice • WeChat = WhatsApp + Facebook + Instagram + PayPal + Uber … • #monthly active users: 963 millions @2017 2ndQ • Daily#: dozens of billion messages sent, hundreds of million photos uploaded, hundreds of million payment transactions executed • First studies on testing industrial Android apps [FSE’16IN][ICSE’17SEIP] • Beyond open source Android apps focused by academia WeChat http://taoxie.cs.illinois.edu/publications/esecfse17industry-replay.pdf http://taoxie.cs.illinois.edu/publications/fse16industry-wechat.pdf
  • 8. Next: Intelligent Software Testing(?) • Learning from others working on the same things • Our work on mining API usage method sequences to test the API [ESEC/FSE’09: MSeqGen] • Visser et al. Green: Reducing, reusing and recycling constraints in program analysis. FSE’12. • Learning from others working on similar things • Jia et al. Enhancing reuse of constraint solutions to improve symbolic execution. ISSTA’15. • Aquino et al. Heuristically Matching Solution Spaces of Arithmetic Formulas to Efficiently Reuse Solutions. ICSE’17. [Jia et al. ISSTA’15]
  • 9. Mining and Understanding Software Enclaves (MUSE) http://materials.dagstuhl.de/files/15/15472/15472.SureshJagannathan1.Slides.pdf DARPA
  • 10. Pliny: Mining Big Code to help programmers (Rice U., UT Austin, Wisconsin, Grammatech) http://pliny.rice.edu/ http://news.rice.edu/2014/11/05/next-for-darpa-autocomplete-for-programmers-2/ $11 million (4 years)
  • 11. Program Synthesis: NSF Expeditions in Computing https://excape.cis.upenn.edu/https://www.sciencedaily.com/releases/2016/08/160815134941.htm 10 millions (5 years)
  • 12. Software related data are pervasive Runtime traces Program logs System events Perf counters … Usage log User surveys Online forum posts Blog & Twitter … Source code Bug history Check-in history Test cases Keystrokes …
  • 13. In Collaboration with Microsoft Research Asia Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services. http://taoxie.cs.illinois.edu/publications/malets11-analytics.pdf Software Analytics
  • 14. Past: Software Analytics • StackMine [ICSE’12, IEEESoft’13]: performance debugging in the large • Data Source: Performance call stack traces from Windows end users • Analytics Output: Ranked clusters of call stack traces based on shared patterns • Impact: Deployed/used in daily practice of Windows Performance Analysis team • XIAO [ACSAC’12, ICSE’17 SEIP]: code-clone detection and search • Data Source: Source code repos (+ given code segment optionally) • Analytics Output: Code clones • Impact: Shipped in Visual Studio 2012; deployed/used in daily practice of Microsoft Security Response Center In Collaboration with Microsoft Research Asia Internet
  • 15. Past: Software Analytics • Service Analysis Studio [ASE’13-EX]: service incident management • Data Source: Transaction logs, system metrics, past incident reports • Analytics Output: Healing suggestions/likely root causes of the given incident • Impact: Deployed and used by an important Microsoft service (hundreds of millions of users) for incident management In Collaboration with Microsoft Research Asia
  • 16. Next: Intelligent Software Analytics(?) Microsoft Research Asia - Software Analytics Group - Smart Data Discovery IN4: INteractive, Intuitive, Instant, INsights Quick Insights -> Microsoft Power BI Gartner Magic Quadrant for Business Intelligence & Analytics Platforms
  • 17. Microsoft Research Asia - Software Analytics Group https://www.hksilicon.com/articles/1213020
  • 18. 18 Existing Approaches on NL  Regular Expressions [Ranta 1998], [Kushman and Barzilay 2013], [Locascio et al. 2016] Used only synthetic data for training and testing Are these approaches effective to address real-world situations ? Deep Learning for NLRegex: Get Real! Zhong et al. Generating Regular Expressions from Natural Language Specifications: Are We There Yet? In AAAI 2018 Workshop on NLP for Software Engineering (NL4SE 2018) http://taoxie.cs.illinois.edu/publications/nl4se18-regex.pdf
  • 19. Synthetic datasets KB13 [Kushman and Barzilay 2013] (824 pairs)  Write NL sentences to capture the examples strings NL-RX [Locascio et al., 2016] (10,000 pairs)  Parse a regex and generate initial NL sentences based on a predefined grammar  Paraphrase the generated sentences Real-world dataset RegexLib (3,619 pairs)  From regexlib.com 19 Characteristic Study
  • 20. 20 Complexity of regular expressions • Synthetic dataset support only a subset of regex language: e.g., ‘?’ ∈ RegexLib, but ∉ NL-RX or KB-13 Length statistics of regular expressions
  • 21. •# of distinct words: 13,491 (RegexLib) vs 715 (KB13) vs 560(NL-RX) 21 Complexity of NL sentences #words statistics of NL sentences
  • 22. Deep-Regex [Locascio et al. 2016] Regular expression generation  Machine translation 22 Experimental Study Sequence-to-sequence learning https://github.com/nicholaslocascio/deep-regexhttps://aclweb.org/anthology/D/D16/D16-1197.pdf
  • 23. String-Equal: exact-matching DFA-Equal: semantically matching 23 Effectiveness on Synthetic Datasets DFA: Deterministic Finite Automaton
  • 24. Experiment settings Use Deep-Regex to train a model using synthetic NL-RX dataset Build a testing set (1,091 pairs) from RegexLib Eliminate long NL sentences Results Without beam search: cannot generate any correct regex Beam search (size: 20): generate correct regexs for 5 NL (0.46%) Huge Drop of Top-20 accuracy! (90.9%  0.46%) 24 Experiments on Real-world Dataset
  • 25. Variations of NL sentences  NL-RX: NL sentences are generated from a predefined grammar  Augmenting training data may alleviate the error Numerical range 25 New Causes of Errors on Real-world Dataset Description Ground Truth Predicted Result Match the numbers 100 to 199. 1[0-9][0-9] ([0-9])*
  • 26. RegexLib is too sparse to be a sufficient training set Collect sufficient labeled real-world data Synthesize data to supplement the collected real-world data 26 Ongoing Work: Large Real-world Benchmark Dataset # Pairs # distinct words NL-RX 10,000 560 RegexLib 3619 13,491
  • 27. String test cases can handle the ambiguity of NL sentences String test cases can differentiate regular expression candidates help select the best candidate during beam search 27 Description Ground Truth Predicted Result Items with a small letter preceding “dog”, at least thrice ([a-b].*dog.*){3,} ([a-b]).*((dog){3,}) Test case:“adogadogadog” Ongoing Work: Testability of Regular Expressions
  • 29. 29 AI for SE Startups Rooted from Research http://www.diffblue.com/ Oxford University spin-off, Daniel Kroening et al. Peking University spin-off, Ge Li et al. https://www.codota.com/ Technion spin-off, Eran Yahav et al. Technical University Munich spin-off, Benedikt Hauptmann et al. https://www.qualicen.de/en/ http://aixcoder.com/
  • 30. Open Topics in Intelligent Software Engineering (ISE) • How to determine whether a software engineering tool is indeed “intelligent”? • Turing test for such tool? • What sub-areas/problems in ISE shall the research community invest efforts on as high priority? • How to turn ISE research results into industrial/open source practice? • …
  • 31. Artificial Intelligence  Software Engineering Artificial Intelligence Software Engineering Intelligent Software Engineering Intelligent Software Engineering
  • 32. White-House-Sponsored Workshop (2016 June 28) http://www.cmu.edu/safartint/
  • 33. Self-Driving Tesla Involved in Fatal Crash (2016 June 30) http://www.nytimes.com/2016/07/01/business/self-driving-tesla-fatal-crash-investigation.html “A Tesla car in autopilot crashed into a trailer because the autopilot system failed to recognize the trailer as an obstacle due to its “white color against a brightly lit sky” and the “high ride height” http://www.cs.columbia.edu/~suman/docs/deepxplore.pdf
  • 34. Microsoft's Teen Chatbot Tay Turned into Genocidal Racist (2016 March 23/24) http://www.businessinsider.com/ai-expert-explains-why-microsofts-tay-chatbot-is-so-racist-2016-3 "There are a number of precautionary steps they [Microsoft] could have taken. It wouldn't have been too hard to create a blacklist of terms; or narrow the scope of replies. They could also have simply manually moderated Tay for the first few days, even if that had meant slower responses." “businesses and other AI developers will need to give more thought to the protocols they design for testing and training AIs like Tay.”
  • 35. NSF New Program: Formal Methods in the Field • Anticipated Funding: $8 millions; #awards: 8 • Deadline: May 8th 2018 Machine Learning: The sheer complexity of machine learning algorithms and their applications makes it hard to ensure correctness. Exploration of new formal methods can be used to characterize boundaries of behavior, and may bring much needed rigor to machine learning algorithms and applications. These techniques could range from novel programming languages and compilers for more robust machine learning to formal verification techniques for machine learning systems that could provide assurances of safety, correctness, and fairness. The interplay between program synthesis and machine learning offers many interesting possibilities to both improve machine learning and formal techniques. https://www.nsf.gov/pubs/2018/nsf18536/nsf18536.htm
  • 36. Problems in Testing ML Software ● ML Software suffers from the “no oracle problem” ○ Previous approach @Columbia U. on metamorphic testing: check satisfaction of a property with different inputs in equivalent classes https://medium.com/trustableai/testing-ai-with-metamorphic-testing-61d690001f5c ● Inaccuracy may be desirable to avoid the overfitting problem ● Auto-generated test inputs have no expected outputs 36
  • 37. Multiple-Implementation Testing 37 http://taoxie.cs.illinois.edu/publications/edsmls18-mitest.pdf Srisakaokul et al. Multiple-Implementation Testing of Supervised Learning Software. InAAAI-18 Workshop on Engineering Dependable and Secure Machine Learning Systems (EDSMLS 2018)
  • 38. Evaluation Setup ● kNN: ○ 19 implementations (including Weka, RapidMiner, and KNIME) ○ Parameters: k = 1, Euclidean-distance metric ○ 3 data sets: Iris, Breast Cancer Wisconsin (BCW), Glass Identification (Glass) ● Naive Bayes (NB): ○ 7 implementations (including Weka, RapidMiner, and KNIME) ○ Parameters: none ○ 3 data sets: Breast Cancer Wisconsin (BCW), Haberman’s Survival Data (Haberman), Hayes-Roth (Hayes) ● Randomly split each data set into training and test set with the ratio of 4:1 ● The data sets contain about 1000 instances in total 38
  • 39. Effectiveness of Majority Oracle Overall, 20.5% of the tests are deviating tests, and 97.5% of the deviating tests reveal faults 39 Algorithm Major-Oracle Deviating Tests (%) Fault Revealing Tests (%) #Faults kNN 23.84% 100.00% 13 NB 16.29% 94.31% 16 kNN+NB 20.50% 97.50% 29
  • 40. Effectiveness of Majority Oracle (cont.) 40
  • 41. Effectiveness of Majority Oracle (cont.) 41
  • 42. Fault Example 1 (in kNN) ● Returns NaN ● !(max==min) should be !(maxValue==minValue) 42
  • 43. Effectiveness of Majority Oracle (cont.) 43
  • 44. Fault Example 2 (in kNN) ● When k = 1, the method returns the first element without sorting 44
  • 45. Other’s Work at Columbia/Lehigh U.: SOSP 2017 Best Paper Award http://www.cs.columbia.edu/~suman/docs/deepxplore.pdf https://github.com/peikexin9/deepxplore
  • 46. Other’s Work at Columbia U./UVa: ICSE 2018 https://arxiv.org/pdf/1708.08559.pdf
  • 47. Our Most Recent Work: “Testing” a Classifier (aka Adversarial Machine Learning) Malware Detection in Adversarial Settings: Exploiting Feature Evolutions and Confusions in Android Apps WeiYang, Deguang Kong,Tao Xie and Carl A. Gunter Annual Computer Security Applications Conference (ACSAC 2017) http://taoxie.cs.illinois.edu/publications/acsac17-malware.pdf
  • 48. Evasion attack on classifiers • Goals: Understand classifier robustness; Generate testing samples to help build better classifiers. • Example: 4848
  • 50. Three practical constraints to craft a realistic attack against mobile malware classifiers • Preserving Malicious Behaviors. • Maintaining the Robustness of Apps. • Evading Malware Detectors. 50
  • 51. Malware Recomposition Variation (MRV) • Malware Evolution Attack • Malware Confusion Attack • Insight • Follow existing patterns! • In our mutation strategies, the feature patterns are extracted from existing malware evolution histories and existing evasive malware. 51 Figure Credit: Trend Micro Figure Credit: Malware News
  • 52. Why MRV works • Large feature set has numerous non-informative or even misleading features. • Insight 1: Malware detectors often confuse non-essential features in code clones as discriminative features. • Insight 2: Using a universal set of features for all malware families would result in a large number of non-essential features to characterize each family. 52
  • 53. Feature Model • A substitute model • Resource Temporal Locale Dependency model • Summarize the essential features and contextual features commonly used in malware detection • Transferability property 53 Target Model Substitute Model Adversarial Samples Labeled Data TrainClassify Adversarial craftingAttack
  • 54. Approach • Mutation strategy synthesis: • Phylogenetic analysis for evolution attack • Similarity metric for confusion attack • Program mutation • Program transplantation/refactoring 54
  • 55. Practicability of attacks • Check the preserving of malicious behaviors • Our impact analysis is based on the insight that the component-based nature of Android constrains the impact of mutations within certain components • Check the robustness of mutated apps • Each mutated app was tested against 5,000 events randomly generated by Monkey to ensure that the app does not crash 55
  • 56. Evaluation • Malware detection techniques: • AppContext, a malware detector leveraging semantic features extracted from call graphs and control-flow graphs. • Drebin, a malware detector leveraging eight categories of features that reside either in the manifest file or in the disassembled code. • Subjects: 1,917 malware and 1,935 benign apps • Baseline: • OCTOPUS, a syntactic app obfuscation tool similar to DroidChameleon. • Random MRV 56
  • 57. Results - Defeating existing malware detection 57 • ORI: Original test dataset (ORI) • MRV: Test dataset with adversarial samples.
  • 58. Results – Comparing with Baselines 58 • MRV produces much more evasive variants than both OCTOPUS and Random MRV for all three tools, especially the learning-based tools
  • 59. Results – Comparing with Baselines 59 • Random MRV generates more than 320,000 variants, but only 212 of them can run without crashing (and only 2 can evade detection of AppContext).
  • 60. Strengthening the robustness of detection • Adversarial Training • We randomly chose half of our generated malware variants into the training set to train the model • Variant Detector • We create a new classifier called variant detector to detect whether an app is a variants derived from existing malware. • Weight Bounding • We constrain the weight on a few dominant features to make feature weights more evenly distributed. 60
  • 61. Results: strengthening the robustness of detection 6161
  • 62. Artificial Intelligence  Software Engineering Artificial Intelligence Software Engineering Intelligent Software Engineering Intelligent Software Engineering
  • 63. 63 Thank You! Q & A This work was supported in part by NSF under grants no. CCF-1409423, CNS-1434582, CNS-1513939, CNS-1564274.
  • 64. Artificial Intelligence  Software Engineering Artificial Intelligence Software Engineering Intelligent Software Engineering Intelligent Software Engineering