SlideShare a Scribd company logo
 Software Engineering
 Automated Software Debugging
 Automated Code Search
 Automated Code Review
Masud Rahman, PhD
Assistant Professor
Faculty of Computer Science
Office: 218, Goldberg Building
Dalhousie University, Canada
masud.rahman@dal.ca
Interested in more details? please visit: https://web.cs.dal.ca/~masud/raise
Masud Rahman: Academic Journey
2
Khulna University,
Bangladesh
(2005—2009)
University of
Saskatchewan, Canada
(2012—2019)
Polytechnique
Montreal, Canada
(2019--2020)
Dalhousie University,
Canada (2020--2022)
Real Life Software Bugs & Failures
$1.7 trillion/year
(Global, 2017)
Software bug is a fault/error/flaw in the program that
causes the program to behave unexpectedly ---Wikipedia
A Tale of Software Bugs & Features!
Find the bug
Understand the bug
Repair the bug/faulty code
Find the right code for a feature
Quality control of code-level changes
1
2
3
4
Bug Report & Bug Localization
Bug Localization
6
Stack traces
500 keywords
Example 1: Find the bug in software code
i entry
j entry
Ci Mi
Cj Mj
Static
Static
Hierarchical
Search Keyword Selection from Trace Graph
7
 




 )
(
)
1
0
(
|
)
(
|
)
(
)
1
(
)
(
i
v
In
j
j
j
i
v
Out
v
S
v
S 


Ci
Cj
Mk
Mn
Cp
PageRank Algorithm
(Google)
ESEC/FSE 2018
Find the bug using Information Retrieval
JDIValue, toString, execute,
EvaluationThread, run, NullPointerException
able cast null
Keyword
selection
127 Words
1
Explain the bug (a.k.a., faulty software code)
• Rule-based explanation
• Not accurate
• Cryptic, hard to understand
Example 2: Explain a bug with GitHub
Bug-fix pull
requests
10
Faulty code
Commit messages
Explain a bug with regular texts
Abstract Syntax Tree (AST)
Convert input dense tensor
Deep learning
Message
12
BufferedImage Grayscale ImageEdit ColorConvertOp File Transparency
ColorSpace BufferedImageOp Graphics ImageEffects
Convert image to gray scale without losing transparency
Example 3: Find the right software code
Candidate API Collection from Stack Overflow
13
PageRank TF-IDF
IndexColorModel
ColorSpaceType
BufferedImageOp
Gray
ImageEffects
JPEGResize
Color
IOException
Graphics
ColorConvertOp
Relevant Q&As
Code
Elements
Candidate
API List
Candidate
API List
Convert image to
Grayscale ….
ICSME 2018
Relevant API Selection with Borda Count
14
BORDA Count: A>B
if ∑rank(A) > ∑rank(B)
Borda Score
Candidate APIs by
PageRank
Candidate APIs by
TF-IDF
B: Donald
A: Joe
Relevant API Selection with Word Embedding
15
Semantic Proximity: A>B
if proximity(Q,A) > proximity(Q,B)
Semantic
Proximity Score
Query
Candidate
API List
1.4M
Impact of NLP2API on Search Query
16
Convert image to gray scale without losing transparency 115
BufferedImage Grayscale ImageEdit ColorConvertOp File Transparency
ColorSpace BufferedImageOp Graphics ImageEffects
02
Convert image to gray scale without losing transparency
Masud Rahman, PhD
Assistant Professor
Faculty of Computer Science
Office: 218, Goldberg CS Building
Dalhousie University, Canada
masud.rahman@dal.ca
Interested in more details? please visit: https://web.cs.dal.ca/~masud/raise
Relevant API Selection with Word Embedding
18
Semantic Proximity: A>B
if proximity(Q,A) > proximity(Q,B)
Semantic
Proximity Score
Query
Candidate
API List
1.4M
Query Expansion with Relevant API Classes
19
Borda Score Semantic Proximity
Score
Initial Query Expanded Query
Ranked API
Classes

More Related Content

Similar to HereWeCode 2022: Dalhousie University

Programming Languages Trends for 2023
Programming Languages Trends for 2023Programming Languages Trends for 2023
Programming Languages Trends for 2023
Svetlin Nakov
 
Towards a UML and IFML mapping to GraphQL
Towards a UML and IFML mapping to GraphQLTowards a UML and IFML mapping to GraphQL
Towards a UML and IFML mapping to GraphQL
Jordi Cabot
 
7068458.ppt
7068458.ppt7068458.ppt
7068458.ppt
jeronimored
 
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
Jeff Hung
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Debdoot Mukherjee
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Debdoot Mukherjee
 
Major Project Presentation (7th Sem) - Code Detection.pptx
Major Project Presentation (7th Sem) - Code Detection.pptxMajor Project Presentation (7th Sem) - Code Detection.pptx
Major Project Presentation (7th Sem) - Code Detection.pptx
sohanmahanta1
 
My Projects & My Stories
My Projects & My StoriesMy Projects & My Stories
My Projects & My Stories
Justin Cui
 
Christian Mladenov @ Intuitics
Christian Mladenov @ IntuiticsChristian Mladenov @ Intuitics
Christian Mladenov @ Intuitics
PAPIs.io
 
Paper summary
Paper summaryPaper summary
Paper summary
Adam Feldscher
 
DIPAK INGLE_RESUME_final
DIPAK INGLE_RESUME_finalDIPAK INGLE_RESUME_final
DIPAK INGLE_RESUME_finalDipak Ingle
 
Euro python 2015 writing quality code
Euro python 2015   writing quality codeEuro python 2015   writing quality code
Euro python 2015 writing quality code
radek_j
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
Ning Jiang
 
AppliFire - Low Code Rapid Application Development Platform
AppliFire - Low Code Rapid Application Development PlatformAppliFire - Low Code Rapid Application Development Platform
AppliFire - Low Code Rapid Application Development Platform
Ajit Singh
 
Automated server-side model for recognition of security vulnerabilities in sc...
Automated server-side model for recognition of security vulnerabilities in sc...Automated server-side model for recognition of security vulnerabilities in sc...
Automated server-side model for recognition of security vulnerabilities in sc...
IJECEIAES
 

Similar to HereWeCode 2022: Dalhousie University (20)

Programming Languages Trends for 2023
Programming Languages Trends for 2023Programming Languages Trends for 2023
Programming Languages Trends for 2023
 
Hadoop dev 01
Hadoop dev 01Hadoop dev 01
Hadoop dev 01
 
Lichang Wang_CV
Lichang Wang_CVLichang Wang_CV
Lichang Wang_CV
 
Towards a UML and IFML mapping to GraphQL
Towards a UML and IFML mapping to GraphQLTowards a UML and IFML mapping to GraphQL
Towards a UML and IFML mapping to GraphQL
 
7068458.ppt
7068458.ppt7068458.ppt
7068458.ppt
 
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis Systems
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
 
Major Project Presentation (7th Sem) - Code Detection.pptx
Major Project Presentation (7th Sem) - Code Detection.pptxMajor Project Presentation (7th Sem) - Code Detection.pptx
Major Project Presentation (7th Sem) - Code Detection.pptx
 
My Projects & My Stories
My Projects & My StoriesMy Projects & My Stories
My Projects & My Stories
 
ZaheerFinal20Aug
ZaheerFinal20AugZaheerFinal20Aug
ZaheerFinal20Aug
 
Christian Mladenov @ Intuitics
Christian Mladenov @ IntuiticsChristian Mladenov @ Intuitics
Christian Mladenov @ Intuitics
 
Paper summary
Paper summaryPaper summary
Paper summary
 
DIPAK INGLE_RESUME_final
DIPAK INGLE_RESUME_finalDIPAK INGLE_RESUME_final
DIPAK INGLE_RESUME_final
 
Euro python 2015 writing quality code
Euro python 2015   writing quality codeEuro python 2015   writing quality code
Euro python 2015 writing quality code
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
 
AppliFire - Low Code Rapid Application Development Platform
AppliFire - Low Code Rapid Application Development PlatformAppliFire - Low Code Rapid Application Development Platform
AppliFire - Low Code Rapid Application Development Platform
 
Garv Jain
Garv JainGarv Jain
Garv Jain
 
KarunAggarwal
KarunAggarwalKarunAggarwal
KarunAggarwal
 
Automated server-side model for recognition of security vulnerabilities in sc...
Automated server-side model for recognition of security vulnerabilities in sc...Automated server-side model for recognition of security vulnerabilities in sc...
Automated server-side model for recognition of security vulnerabilities in sc...
 

More from Masud Rahman

The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
Masud Rahman
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of Saskatchewan
Masud Rahman
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud Rahman
Masud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud Rahman
Masud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud Rahman
Masud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Masud Rahman
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
Masud Rahman
 
MSR2017-Challenge
MSR2017-ChallengeMSR2017-Challenge
MSR2017-Challenge
Masud Rahman
 
MSR2017-RevHelper
MSR2017-RevHelperMSR2017-RevHelper
MSR2017-RevHelper
Masud Rahman
 
STRICT-SANER2017
STRICT-SANER2017STRICT-SANER2017
STRICT-SANER2017
Masud Rahman
 
MSR2015-Challenge
MSR2015-ChallengeMSR2015-Challenge
MSR2015-Challenge
Masud Rahman
 
MSR2014-Challenge
MSR2014-ChallengeMSR2014-Challenge
MSR2014-Challenge
Masud Rahman
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015
Masud Rahman
 
STRICT-SANER2015
STRICT-SANER2015STRICT-SANER2015
STRICT-SANER2015
Masud Rahman
 
CMPT-842-BRACK
CMPT-842-BRACKCMPT-842-BRACK
CMPT-842-BRACK
Masud Rahman
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017
Masud Rahman
 
RACK-SANER2016
RACK-SANER2016RACK-SANER2016
RACK-SANER2016
Masud Rahman
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeQUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-Singapore
Masud Rahman
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
Masud Rahman
 
CORRECT-ICSE2016
CORRECT-ICSE2016CORRECT-ICSE2016
CORRECT-ICSE2016
Masud Rahman
 

More from Masud Rahman (20)

The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...
 
PhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of SaskatchewanPhD Seminar - Masud Rahman, University of Saskatchewan
PhD Seminar - Masud Rahman, University of Saskatchewan
 
PhD proposal of Masud Rahman
PhD proposal of Masud RahmanPhD proposal of Masud Rahman
PhD proposal of Masud Rahman
 
PhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud RahmanPhD Comprehensive exam of Masud Rahman
PhD Comprehensive exam of Masud Rahman
 
Doctoral Symposium of Masud Rahman
Doctoral Symposium of Masud RahmanDoctoral Symposium of Masud Rahman
Doctoral Symposium of Masud Rahman
 
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
Supporting Source Code Search with Context-Aware and Semantics-Driven Code Se...
 
ICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-LocalizationICSE2018-Poster-Bug-Localization
ICSE2018-Poster-Bug-Localization
 
MSR2017-Challenge
MSR2017-ChallengeMSR2017-Challenge
MSR2017-Challenge
 
MSR2017-RevHelper
MSR2017-RevHelperMSR2017-RevHelper
MSR2017-RevHelper
 
STRICT-SANER2017
STRICT-SANER2017STRICT-SANER2017
STRICT-SANER2017
 
MSR2015-Challenge
MSR2015-ChallengeMSR2015-Challenge
MSR2015-Challenge
 
MSR2014-Challenge
MSR2014-ChallengeMSR2014-Challenge
MSR2014-Challenge
 
CodeInsight-SCAM2015
CodeInsight-SCAM2015CodeInsight-SCAM2015
CodeInsight-SCAM2015
 
STRICT-SANER2015
STRICT-SANER2015STRICT-SANER2015
STRICT-SANER2015
 
CMPT-842-BRACK
CMPT-842-BRACKCMPT-842-BRACK
CMPT-842-BRACK
 
RACK-Tool-ICSE2017
RACK-Tool-ICSE2017RACK-Tool-ICSE2017
RACK-Tool-ICSE2017
 
RACK-SANER2016
RACK-SANER2016RACK-SANER2016
RACK-SANER2016
 
QUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-SingaporeQUICKAR-ASE2016-Singapore
QUICKAR-ASE2016-Singapore
 
CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016CORRECT-ToolDemo-ASE2016
CORRECT-ToolDemo-ASE2016
 
CORRECT-ICSE2016
CORRECT-ICSE2016CORRECT-ICSE2016
CORRECT-ICSE2016
 

Recently uploaded

DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Ashish Kohli
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Azure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHatAzure Interview Questions and Answers PDF By ScholarHat
Azure Interview Questions and Answers PDF By ScholarHat
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
Aficamten in HCM (SEQUOIA HCM TRIAL 2024)
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 

HereWeCode 2022: Dalhousie University

  • 1.  Software Engineering  Automated Software Debugging  Automated Code Search  Automated Code Review Masud Rahman, PhD Assistant Professor Faculty of Computer Science Office: 218, Goldberg Building Dalhousie University, Canada masud.rahman@dal.ca Interested in more details? please visit: https://web.cs.dal.ca/~masud/raise
  • 2. Masud Rahman: Academic Journey 2 Khulna University, Bangladesh (2005—2009) University of Saskatchewan, Canada (2012—2019) Polytechnique Montreal, Canada (2019--2020) Dalhousie University, Canada (2020--2022)
  • 3. Real Life Software Bugs & Failures $1.7 trillion/year (Global, 2017) Software bug is a fault/error/flaw in the program that causes the program to behave unexpectedly ---Wikipedia
  • 4. A Tale of Software Bugs & Features! Find the bug Understand the bug Repair the bug/faulty code Find the right code for a feature Quality control of code-level changes 1 2 3 4
  • 5. Bug Report & Bug Localization Bug Localization
  • 6. 6 Stack traces 500 keywords Example 1: Find the bug in software code i entry j entry Ci Mi Cj Mj Static Static Hierarchical
  • 7. Search Keyword Selection from Trace Graph 7        ) ( ) 1 0 ( | ) ( | ) ( ) 1 ( ) ( i v In j j j i v Out v S v S    Ci Cj Mk Mn Cp PageRank Algorithm (Google) ESEC/FSE 2018
  • 8. Find the bug using Information Retrieval JDIValue, toString, execute, EvaluationThread, run, NullPointerException able cast null Keyword selection 127 Words 1
  • 9. Explain the bug (a.k.a., faulty software code) • Rule-based explanation • Not accurate • Cryptic, hard to understand
  • 10. Example 2: Explain a bug with GitHub Bug-fix pull requests 10 Faulty code Commit messages
  • 11. Explain a bug with regular texts Abstract Syntax Tree (AST) Convert input dense tensor Deep learning Message
  • 12. 12 BufferedImage Grayscale ImageEdit ColorConvertOp File Transparency ColorSpace BufferedImageOp Graphics ImageEffects Convert image to gray scale without losing transparency Example 3: Find the right software code
  • 13. Candidate API Collection from Stack Overflow 13 PageRank TF-IDF IndexColorModel ColorSpaceType BufferedImageOp Gray ImageEffects JPEGResize Color IOException Graphics ColorConvertOp Relevant Q&As Code Elements Candidate API List Candidate API List Convert image to Grayscale …. ICSME 2018
  • 14. Relevant API Selection with Borda Count 14 BORDA Count: A>B if ∑rank(A) > ∑rank(B) Borda Score Candidate APIs by PageRank Candidate APIs by TF-IDF B: Donald A: Joe
  • 15. Relevant API Selection with Word Embedding 15 Semantic Proximity: A>B if proximity(Q,A) > proximity(Q,B) Semantic Proximity Score Query Candidate API List 1.4M
  • 16. Impact of NLP2API on Search Query 16 Convert image to gray scale without losing transparency 115 BufferedImage Grayscale ImageEdit ColorConvertOp File Transparency ColorSpace BufferedImageOp Graphics ImageEffects 02 Convert image to gray scale without losing transparency
  • 17. Masud Rahman, PhD Assistant Professor Faculty of Computer Science Office: 218, Goldberg CS Building Dalhousie University, Canada masud.rahman@dal.ca Interested in more details? please visit: https://web.cs.dal.ca/~masud/raise
  • 18. Relevant API Selection with Word Embedding 18 Semantic Proximity: A>B if proximity(Q,A) > proximity(Q,B) Semantic Proximity Score Query Candidate API List 1.4M
  • 19. Query Expansion with Relevant API Classes 19 Borda Score Semantic Proximity Score Initial Query Expanded Query Ranked API Classes

Editor's Notes

  1. This is my academic journey. I completed my undergrad from Khulna University back in 2009. Then I came to Canada back in 2012 for my graduate studies. I completed my Masters and PhD from University of Saskatchewan. Then in 2019, I moved to Polytechnique Montreal as a postdoctoral fellow.
  2. This is an example bug report! It talks about a software bug! Now, the bug is hidden somewhere in the software code. So, the process of finding that buggy code is called bug localization.
  3. This is an example for noisy bug report. It contains stack trace information. It contains five hundred keywords. That means too many signals. It is hard to separate the signals from the noise. That means, the bug report does not work well as a search query. So, how to solve this problem? How can we make a query and find out the bug?
  4. Once we have the graph, we use a graph-based algorithm called PageRank for keyword selection. Now, this is a recursive algorithm, and its a bit complex. But I will try to explain it with this diagram. So, the algorithm is based on voting mechanism. That is, if a node gets enough votes from other important nodes, that this node is likely to be important. So, why this guy is bigger and laughing? Because, he is being voted by other important nodes. So, this is a recursive process. Once the computation is over, we get a ranked list of nodes. That means, from the trace graph, we get a ranked list of method and class names. But how does it help? Lets see.
  5. Now, let me show you how Information Retrieval-based bug localization works. This is an example bug report, and we want to find out this buggy code from the codebase. Now, what the existing approaches were doing? They tried to consider this whole bug report as a query Then they submit this query to a code search engine which is sitting on top of our codebase. When I say search engine, I mean the local search engines like Lucene. Now this ad hoc query returns the buggy code at 53rd position. That means the developer needs to check 52 non-buggy code before reaching the buggy code, which is time-consuming and not good for developer productivity. However, if we look closely, we see that this bug report contains 127 words. If we can carefully choose these keywords and use them as a query, we can retrieve the buggy code at the topmost position, which is exactly what we want, right? Now obviously, identifying these keywords is extremely challenging, which makes it our first research problem. So, we reduce the bug localization problem into a keyword selection problem ☺
  6. Developers search for software bugs and features within a local codebase. However, searching within a local codebase might not be enough. They need to search on the web. Study shows that they spend 20% of their time for code search on the web. So, lets say the developer is implementing a software feature and he/she needs the code that can convert an image to gray scale without losing the transparency Now as a standard practice, the develop makes this natural language query, and submits the query to GitHub, the largest code repository on the web. Now, GitHub provides this result. But as you see, it does not look very relevant. The developer is looking for something like this. If we look carefully, we see that it is pretty hard to retrieve this code with this query. Because, there is not enough keyword matching. But if we can replace this natural language query with these relevant API classes, then we can get lots of keyword matching and we can easily retrieve this code. But as you can imagine, transforming the NL query into these relevant API classes could be very challenging! So, this makes it our third research problem. And once again, Stack Overflow is our friend in this grand challenge.
  7. First, we submit the query to Stack Overflow, which returns a list of relevant Q&A threads. What is a Q&A thread? Well, here is an example. This is a question, and this is the answer. We also see that the answer contain several program elements such as API classes. So, what we do? We capture these program elements using regular expressions. Then we use two keyword selection algorithms to make two list of candidate API classes. But then what? We use two more items to detect the most relevant API classes from these lists. The essence of Borda count is -- If API A is more frequent than API B in the relevant Q & A threads from Stack Overflow, A is more appropriate than B. So, it’s a kind of likelihood of A over B for the target query. For the second metric, we preprocess Stack Overflow corpus, develop a Skip-gram model using FastText, an improved version of Word2Vec. Then we determine, how close an API is to the given query keywords within the semantic space. So, we A is more semantically close to query Q than B, then A is more appropriate than B for the query. So, we then combine these two metrics for each candidate API class, do the ranking, and return the Top-K classes as our reformulation terms.
  8. Since we have two candidate lists, we use Borda count to find the most relevant API classes. Now, this is how it works? If Bernie wins according to multiple polls, he is a better candidate. Similarly if API A ranks higher in multiple ranked list than API B, then API A is more relevant in our problem context. So, yes, this is how, we get the Borda score for all the candidates.
  9. We also use another way to find out the most relevant API classes, this is called semantic proximity. For doing that, first, we collect 1.4 million Q&A threads from Stack Overflow and make a corpus. We preprocess them and feed them to FastText. Now, FastText is a neural text classifier tool that is basically a 3-layer neural network. What it does is, it transforms the corpus into a semantic space. Now, what is a semantic space? Now, this is an example of semantic space for food names. Here we see that Ramen is closer to Spaghetti than burger. Well, if you tasted these foods, then it makes sense, right? Well, similarly, we create semantic space for API classes and keywords and determine their semantic proximity. Finally, we get the semantic proximity score for each API class.
  10. So, here is the result. If we use only the natural language query, we can retrieve this code example at bottom of the list. But when we use the query our tool, it returns the same relevant code at the 2nd position, which is really interesting. More extensive experiments could be found in the paper.
  11. We also use another way to find out the most relevant API classes, this is called semantic proximity. For doing that, first, we collect 1.4 million Q&A threads from Stack Overflow and make a corpus. We preprocess them and feed them to FastText. Now, FastText is a neural text classifier tool that is basically a 3-layer neural network. What it does is, it transforms the corpus into a semantic space. Now, what is a semantic space? Now, this is an example of semantic space for food names. Here we see that Ramen is closer to Spaghetti than burger. Well, if you tasted these foods, then it makes sense, right? Well, similarly, we create semantic space for API classes and keywords and determine their semantic proximity. Finally, we get the semantic proximity score for each API class.
  12. Now, we have two scores for each class. What we do? We combine them, rank them, and then collect top few API classes. Then we add them to the natural language query to get the expanded search query.