Earthquake is a violent tremor in the earth’s crust, sending out a series of shock waves in all directions from its place of origin or epicenter.
On the morning of January 26, 2001, the Nation’s 52nd Republic Day, a devastating earthquake occurred in the Kutch district of the state of Gujarat.
Modeling of the Active Wedge behind a Gravity Retaining WallRexRadloff
The Rankine horizontal stress method is a very common and simple approach in calculating the active force behind a gravity retaining wall. However, because numerous assumptions have to be made a deviation among results will arise, although the degree of this discrepancy have been previously defined as negligible and is typically ignored.
A full wedge analysis was performed using the program Wolfram Mathematica<sup>®</sup> to specifically outline the degree of this discrepancy. Results showed that the deviation among the calculated active force was relative to the conditions of the retaining system, and can be substantial at times. Additionally, it was revealed that the difference among results could not be defined as negligible or substantial unless a full wedge analysis was performed.
Stress in Bar of Uniformly Tapering Rectangular Cross Section | Mechanical En...Transweb Global Inc
Strength of Materials is a branch of applied mechanics that deals with behavior of solid bodies subjected to various forces. This may also be known as Mechanics of Materials or mechanics of solids. Copy the link given below and paste it in new browser window to get more information on Stress in Bar of Uniformly Tapering Rectangular Cross Section:- http://www.transtutors.com/homework-help/mechanical-engineering/simple-stresses-and-strain/stress-in-bar-of-uniformly-tapering-rectangular-cross-section.aspx
Earthquake is a violent tremor in the earth’s crust, sending out a series of shock waves in all directions from its place of origin or epicenter.
On the morning of January 26, 2001, the Nation’s 52nd Republic Day, a devastating earthquake occurred in the Kutch district of the state of Gujarat.
Modeling of the Active Wedge behind a Gravity Retaining WallRexRadloff
The Rankine horizontal stress method is a very common and simple approach in calculating the active force behind a gravity retaining wall. However, because numerous assumptions have to be made a deviation among results will arise, although the degree of this discrepancy have been previously defined as negligible and is typically ignored.
A full wedge analysis was performed using the program Wolfram Mathematica<sup>®</sup> to specifically outline the degree of this discrepancy. Results showed that the deviation among the calculated active force was relative to the conditions of the retaining system, and can be substantial at times. Additionally, it was revealed that the difference among results could not be defined as negligible or substantial unless a full wedge analysis was performed.
Stress in Bar of Uniformly Tapering Rectangular Cross Section | Mechanical En...Transweb Global Inc
Strength of Materials is a branch of applied mechanics that deals with behavior of solid bodies subjected to various forces. This may also be known as Mechanics of Materials or mechanics of solids. Copy the link given below and paste it in new browser window to get more information on Stress in Bar of Uniformly Tapering Rectangular Cross Section:- http://www.transtutors.com/homework-help/mechanical-engineering/simple-stresses-and-strain/stress-in-bar-of-uniformly-tapering-rectangular-cross-section.aspx
High resolution mass spectrometry (HRMS) and non-targeted analysis (NTA) are advancing the identification of emerging contaminants in environmental and agricultural matrices. However, confidence in structure identification of unknowns in NTA presents challenges to analytical chemists. Structure identification requires integration of complementary data types such as reference databases, fragmentation prediction tools, and retention time prediction models. The goal of this research is to optimize and implement structure identification functionality within the US EPA’s CompTox Chemicals Dashboard, an open chemistry resource and web application containing data for ~760,000 substances. Rank-ordering the number of sources associated with chemical records within the Dashboard (Data Source Ranking) improves the identification of unknowns by bringing the most likely candidate structures to the top of a search results list. Incorporating additional data streams contained within the database underlying the Dashboard further enhances identifications. Integrating tandem mass spectrometry data into NTA workflows enables spectral match scores and increases confidence in structural assignments. We have generated and stored predicted MS/MS fragmentation spectra for the entirety of the Chemistry Dashboard using the in silico prediction tool CFM-ID. Predicted fragments incorporated into the identification workflow were used as both a scoring term and as a candidate threshold cutoff. Combining these steps within an open chemistry resource provides a freely available software tool for structure identification and NTA. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
The success of developer forums like Stack Overflow (SO) depends on the participation of users and the quality of shared knowledge. SO allows its users to suggest edits to improve the quality of the posts (e.g., questions and answers). Such posts can be rolled back to an earlier version when the current version of the post with the suggested edit does not satisfy the user. However, subjectivity bias in deciding either an edit is satisfactory or not could introduce inconsistencies in the rollback edits. For example, while a user may accept the formatting of a method name (e.g., getActivity()) as a code term, another user may reject it. Such bias in rollback edits could be detrimental and demotivating to the users whose suggested edits were rolled back. This problem is compounded due to the absence of specific guidelines and tools to support consistency across users on their rollback actions. To mitigate this problem, we investigate the inconsistencies in the rollback editing process of SO and make three contributions. First, we identify eight inconsistency types in rollback edits through a qualitative analysis of 777 rollback edits in 382 questions and 395 answers. Second, we determine the impact of the eight rollback inconsistencies by surveying 44 software developers. More than 80% of the study participants find our produced catalogue of rollback inconsistencies to be detrimental to the post quality. Third, we develop a suite of algorithms to detect the eight rollback inconsistencies. The algorithms offer more than 95% accuracy and thus can be used to automatically but reliably inform users in SO of the prevalence of inconsistencies in their suggested edits and rollback actions.
Presentation made during the Intelligent User-Adapted Interfaces: Design and Multi-Modal Evaluation Workshop (IUadaptME) workshop conducted as part of UMAP 2018
Relevance Improvements at Cengage - Ivan Provalovlucenerevolution
In the session we describe relevance improvements we have implemented in our Lucene-based search system for English and Chinese contents and the tests we have performed for Arabic and Spanish contents based on TREC data. We will also describe our relevance feedback web app for the end-users to rank results of various queries. The presentation will have information about the usage data we analyze to improve the relevance. We will also touch upon our OCR data indexing challenges for English and non-English content.
The identification of chemicals in environment media depends on the application of analytical methods, the primary approach being one of the multiple mass spectrometry techniques. Cheminformatics solutions are critical to supporting the chemical identification process. This includes the assembly of large chemical substance databases, prioritization ranking of potential candidate search hits, and search approaches that support both targeted and non-targeted screening approaches. The US Environmental Protection Agency CompTox Chemicals Dashboard is a web-based application providing access to data for over 760,000 chemical substances. This includes access to physicochemical property, environmental fate and transport data, both human and ecological toxicity data, information regarding chemicals contained in products in commerce, and in vitro bioactivity data. Searches are allowed based on chemical identifiers, product and use, genes and assays associated with the EPA ToxCast assays and, specific to supporting mass spectrometry, searches based on masses and formulae. These searches make use of a novel “MS-Ready structures” approach collapsing chemicals related as mixtures, salts, stereoforms and isotopomers. The dashboard supports both singleton or batch searching by accurate mass/chemical formula, supported by MS-ready structures, and utilizes rich meta data to facilitate candidate ranking and the prioritization of chemicals of concern based on toxicity and exposure data. The dashboard also hosts tens of chemical lists that have been assembled from public databases, many supporting non-targeted analysis and mass spectrometry databases.
This presentation will provide an overview of the dashboard and will review our latest research into structure identification by searching experimental mass spectrometry data against predicted fragmentation spectra for LC-MS (positive and negative ion mode) and GC-MS (EI), a total of 3 million predicted spectra. We will also provide an overview of our progress supporting structure and substructure searching, using mass and formula-based filtering, and report on the latest applications of the dashboard to support structure identification projects of interest to the EPA. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Johann Petrak
Slides for the talk about the paper:
Ziqi Zhang, Johann Petrak and Diana Maynard, 2018: Adapted TextRank for Term Extraction: A Generic Method of Improving Automatic Term Extraction Algorithms. Semantics-2018, Vienna, Austria
Ph.D. Dissertation Presentation
B. Thomas Golisano College of Computing and Information Sciences
Rochester Institute of Technology
Date of presentation: June 28, 2022
Location: Virtual
Link to dissertation: https://scholarworks.rit.edu/theses/11219/
Similar to PhD Comprehensive exam of Masud Rahman (20)
RAISE Lab at Dalhousie University
aims to develop tools and technologies for intelligent automation in software engineering. An overview is presented by Dr. Masud Rahman, Assistant Professor, Faculty of Computer Science, Dalhousie University, Canada.
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...Masud Rahman
Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as search queries. On the other hand, there is a piece of recent evidence that suggests that even these natural language-only reports contain enough good keywords that could help localize the bugs successfully. On one hand, these findings suggest that natural language-only bug reports might be a sufficient source for good query keywords. On the other hand, they cast serious doubt on the query selection practices in the IR-based bug localization. In this article, we attempted to clear the sky on this aspect by conducting an in-depth empirical study that critically examines the state-of-the-art query selection practices in IR-based bug localization. In particular, we use a dataset of 2,320 bug reports, employ ten existing approaches from the literature, exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from these bug reports, and then answer three research questions. We confirmed that the state-of-the-art query construction approaches are indeed not sufficient for constructing appropriate queries (for bug localization) from certain natural language-only bug reports. However, these bug reports indeed contain high-quality search keywords in their texts even though they might not contain explicit hints for localizing bugs (e.g., stack traces). We also demonstrate that optimal queries and non-optimal queries chosen from bug report texts are significantly different in terms of several keyword characteristics (e.g., frequency, entropy, position, part of speech). Such an analysis has led us to four actionable insights on how to choose appropriate keywords from a bug report. Furthermore, we demonstrate 27%–34% improvement in the performance of non-optimal queries through the application of our actionable insights to them. Finally, we summarize our study findings with future research directions (e.g., machine intelligence in keyword selection).
Preprint: https://bit.ly/39nAoun
Publication URL: https://bit.ly/3xVUxlq
Replication package: https://bit.ly/36T8oxL
More details: https://web.cs.dal.ca/~masud
Effective Reformulation of Query for Code Search using Crowdsourced Knowledge...Masud Rahman
An effective query reformulation technique that adopts crowd sourced knowledge and large-scale data analytics from Stack Overflow Q&A site, and then improves source code search.
The slides provide a major overview on SOAP protocol, and demonstrates a working example that uses SOAP for RPC. It uses WCF/visual studio and Apache Axis for the implementation.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Delivering Micro-Credentials in Technical and Vocational Education and TrainingAG2 Design
Explore how micro-credentials are transforming Technical and Vocational Education and Training (TVET) with this comprehensive slide deck. Discover what micro-credentials are, their importance in TVET, the advantages they offer, and the insights from industry experts. Additionally, learn about the top software applications available for creating and managing micro-credentials. This presentation also includes valuable resources and a discussion on the future of these specialised certifications.
For more detailed information on delivering micro-credentials in TVET, visit this https://tvettrainer.com/delivering-micro-credentials-in-tvet/
Safalta Digital marketing institute in Noida, provide complete applications that encompass a huge range of virtual advertising and marketing additives, which includes search engine optimization, virtual communication advertising, pay-per-click on marketing, content material advertising, internet analytics, and greater. These university courses are designed for students who possess a comprehensive understanding of virtual marketing strategies and attributes.Safalta Digital Marketing Institute in Noida is a first choice for young individuals or students who are looking to start their careers in the field of digital advertising. The institute gives specialized courses designed and certification.
for beginners, providing thorough training in areas such as SEO, digital communication marketing, and PPC training in Noida. After finishing the program, students receive the certifications recognised by top different universitie, setting a strong foundation for a successful career in digital marketing.
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
PhD Comprehensive exam of Masud Rahman
1. A SYSTEMATIC LITERATURE REVIEW OF
AUTOMATED QUERY REFORMULATIONS IN
SOURCE CODE SEARCH
Masud Rahman
Department of Computer Science
University of Saskatchewan, Canada
Advisor: Dr. Chanchal K. Roy
@masud233
6
2. MASUD RAHMAN: ACADEMICS
2
2019
PhD (In Progress),
University of Saskatchewan
(Award: Dr. Keith Geddes Award)
2014
MSc, University of Saskatchewan
(Award: Best MSc Thesis Nomination)
2009
BSc, Khulna University, Bangladesh
(Award: President Gold Medal)
MasudRahman,UofS
5. 5
MasudRahman,UofS
A SYSTEMATIC LITERATURE REVIEW OF
AUTOMATED QUERY REFORMULATIONS IN
SOURCE CODE SEARCH
Source Code Search
Automated Query
Reformulation
BACKGROUND CONCEPTS
1 2
P1 P2 P3 P4
6. MCAS: A SOFTWARE BUG THAT KILLS
6
MasudRahman,UofS
P1 P2 P3 P4
Boeing 737 MAX 8
7. A TALE OF SOURCE CODE SEARCH
7
MasudRahman,UofS
Boeing
Customer
MCAS Bug report
Boeing Developer Code search
Query Suggestion Query Reformulation
Boeing Codebase
P1 P2 P3 P4
8. QUERY REFORMULATION: 2 WORKING CONTEXTS
8
MasudRahman,UofS
Local code search
(e.g., bug localization)
Internet-scale
code search
Boeing
codebase GitHub
P1 P2 P3 P4
14. SYSTEMATIC LITERATURE REVIEW: 6 STEPS
14
MasudRahman,UofS
Research questions Search keywords Literature search
Literature bulkNoise filtrationPrimary studies
In-depth
investigation
7 RQs
P1 P2 P3 P4
15. SYSTEMATIC LITERATURE REVIEW: PRIMARY
STUDY SELECTION
15
MasudRahman,UofS
ACM DL
CrossRef
DBLP
Mendeley
Google Scholar
IEEE Xplore
ProQuest
ScienceDirect
SpringerLink
Web of Science
Wiley Online Lib
2871 2317 562
Initial
results
Impurity
removal
Filter by
Title
195
Filter by
Abstract
93
Merging &
Duplicate
removal
56
Primary
studies
P1 P2 P3 P4
Filter by
Full texts
Information retrieval, IR, text retrieval, TR, bug localization,
concept location, feature location, FLT, concern location, Internet-
scale code search, code search engine, search engine, local code
search, code search, source code search, and code search query.
Query reformulation, query expansion, query reduction, query
formulation, query refinement, automated query expansion, AQE,
query suggestion, query recommendation, term selection, query
replacement, query difficulty, query quality, keyword selection,
keyword extraction, search term identification, search query,
search term, and search keyword.
+
16. OUR RESEARCH QUESTIONS
16
MasudRahman,UofS
RQ1: Which methods, algorithms and data sources have been used for automated
query reformulations targeting code search in the literature?
P1 P2 P3 P4
RQ2: Which methods, metrics or subject systems have been used to evaluate and
validate the researches on automated query reformulations?
RQ3: What are the major challenges of automated query reformulations intended
for code search? How many of them have been solved to date by the literature?
RQ4: How much activities of research on automated query reformulations have
been performed to date? What are the venues that these researches got published
at?
RQ5: What are the differences and similarities between query reformulations for
local code search and query reformulations for Internet-scale code search?
RQ6: Which one is more appropriate among term weighting, query-term co-
occurrence and thesaurus-based approaches for query keyword selection?
RQ7: What are the scopes for future work in the area of automated query
reformulation targeting the code search?
17. RQ1: WHICH METHODS & ALGORITHMS ARE
USED BY LITERATURE?
17
MasudRahman,UofS
Grounded
Theory
• Open coding
• Axial coding
• Selective
coding
1
2
P1 P2 P3 P4
18. RQ1: WHICH ALGORITHMS & REFORMULATION TYPES
ARE USED BY LITERATURE?
18
MasudRahman,UofS
P1 P2 P3 P4
19. RQ2: WHICH EVALUATION & VALIDATION
SETTINGS ARE EMPLOYED?
19
MasudRahman,UofS
P1 P2 P3 P4
20. RQ3: WHAT ARE COMMON CHALLENGES &
LIMITATIONS OF EXISTING LITERATURE?
20
MasudRahman,UofS
Grounded
Theory
• Open coding
• Axial coding
• Selective
coding
1
2
P1 P2 P3 P4
21. RQ3: WHAT ARE COMMON CHALLENGES & LIMITATIONS OF
EXISTING LITERATURE?
21
MasudRahman,UofS
P1 P2 P3 P4
22. RQ4: PUBLICATION STATS & INTERESTS ON QUERY
REFORMULATION RESEARCH
22
MasudRahman,UofS
P1 P2 P3 P4
23. RQ5: COMPARISON BETWEEN LOCAL CODE SEARCH &
INTERNET-SCALE CODE SEARCH
23
MasudRahman,UofS
TW = Term Weighting, TQC = Term-Query Co-occurrence, TS =
Thesaurus, ON = Ontology, SLM = Search Log Mining, ML = Machine
Learning, HM = Heuristics & Miscellaneous
P1 P2 P3 P4
24. RQ5: COMPARISON BETWEEN LOCAL CODE SEARCH &
INTERNET-SCALE CODE SEARCH
24
MasudRahman,UofS
CH1 = Vocabulary Mismatch Unsolved, CH2 = Extra Burden on
Developers, CH3 = Lack of Generalizability, CH4 = Lack of Practical
Use, CH5 = Inappropriate Use of Tools, CH6 = Human Bias + Weak
Evaluation, CH7 = Unverified Assumptions
P1 P2 P3 P4
25. RQ6: CHALLENGES WITH THREE KEYWORD
SELECTION METHODS
25
MasudRahman,UofS
Method #Study CH1 CH2 CH3 CH6
Term Weighting 22 (39%) 36% 18% 91% 50%
Term-Query Co-occurrence 11 (20%) 9% 27% 64% 91%
Thesaurus 17 (30%) 12% 12% 47% 41%
CH1 = Vocabulary Mismatch Unsolved, CH2 = Extra Burden on
Developers, CH3 = Lack of Generalizability, CH4 = Lack of Practical
Use, CH5 = Inappropriate Use of Tools, CH6 = Human Bias + Weak
Evaluation, CH7 = Unverified Assumptions
P1 P2 P3 P4
27. 27
MasudRahman,,UofS
R1: KEYWORD SELECTION FROM BUG REPORT
Title
Description
ID Query QE
1. Custom search results view iresource
2. Custom search results search results view
3. element iresource provider level tree
4. Custom search results hierarchically java search results
1331
636
01
570
Lower QE is better
P1 P2 P3 P4
28. R2: TERM WEIGHTING FOR SOURCE CODE
28
RFDd t
t
n
D
dftIDFTF log)),log(1()(
• Different syntax
• Different semantics
• Different structures
P1 P2 P3 P4
Hello everyone! Good afternoon!
Thank you all for coming and attending this talk.
My name is Masud Rahman. I am a PhD student in the Department of Computer Science.
I work with Dr. Chanchal K. Roy.
Today, I will be talking about automated query reformulations for code search.
A little bit of background about Me:
Currently, I am a PhD student at USASK.
I completed my MSc in Software Engineering from the same university in 2014.
Before that, I completed my BSc in Computer Science & Engineering from Khulna University, back in 2009.
Got couple of awards.
Today, my talk will be divided into four sections.
In the first section, I will provide a background overview on automated query reformulations & code search in general.
In the second section, I will present a systematic literature review of automated query reformulations.
In the third section, I will discuss about the future research opportunities in this domain.
Finally, we will have a Q&A session.
Part 1: Background concepts.
If we look at my talk’s title, we can see two major concepts.
Source code search
Automated query reformulations.
Now we will go into details. But, lets look at two recent events.
You are looking at two aircrafts -- Ethiopian airlines and Lion Air Indonesia.
These are nose-down situation. Due to these nose down situations, we have two fatal crashes in a single calendar year.
These crashes took 346 precious human lives and cost trillions of dollars.
Now, the culprit is MCAS. This is a software component that was added to Boeing 737-Max 8 version.
The summary is, this is a faulty component, not well designed, and ultimately leads to crash.
That is why, Boeing 737 Max planes are grounded right now.
Now, lets say, a Boeing customer has submitted a bug report.
Now, a Boeing developer is responsible to locate and repair the faulty code triggering that bug.
As a frequent practice, developer chooses a few important keywords and attempts to locate the buggy code within the Boeing codebase.
But the study shows that 88% of the keywords chosen by the developer are incorrect. That is, they do not return the buggy code.
So, the obvious next step is to reformulate the query through automated tool supports, so that the buggy code could be located.
There are also tools that take a bug report and suggest appropriate search queries in the first place.
Now, the developer not only searches in the Boeing codebase, she might search in the Internet-scale codebase such as GitHub as well.
So as discussed, the code search could be done in two working contexts.
It could be in a local codebase such as Boeing.
It could also be in the large-scale open source repository such as GitHub.
Now, based on these contexts, there are different challenges in query reformulation.
The local codebase is small, domain specific and organized.
On the contrary, GitHub is huge, cross-domain and very noisy.
So, yes, they need different strategies to suggest queries for them.
We can reformulate a search query in three ways.
-- It could be query expansion by adding new keywords.
-- It could be query reduction by discarding the noisy keywords.
-- Or it could be total query replacement by using a new set of keywords.
Now, there are many steps in query reformulation.
But three major steps are common.
First, you need to collect feedback on the given query. The query is executed and top-K results are collected for developer inspection.
The developers marks them whether they look relevant or irrelevant. This is step-I.
In the second step, these annotated results are mined using various text mining tools, and candidate keywords are selected using various keyword selection methods.
In the third step, the most important keywords are returned to the developer for query expansion.
Now, there are automated query reformulations and semi-automated query reformulations.
Positives:
It can improve code search performance up to 20%, which is significant.
It helps to redefine the information needs. Developers often not sure which keywords to choose, automated suggestion can help them.
Also reduces the cost and efforts in code search, i.e., in software maintenance such as bug fixing.
Negatives:
Automated reformulation might degrade the already good queries. If you have already good one, you need to stop.
It has a chance of topic drifting. That ism through reformulations, the original topic might be lost.
Now, we are done with Background concepts, Part 1.
Now, we are going into Part 2 -- Systematic Literature Review.
Systematic literature review starts with several research questions.
We ask 7 research questions in our survey about automated query reformulations.
These questions are broken down into search keywords.
Then we use these keywords, and retrieve a bulk of literature from various publication databases.
Then we perform several steps for noise filtration, and select a set of primary studies on automated query reformulations.
Then we do in-depth investigation on these primary studies.
Now, lets look take a closer look on this section.
We choose 11 publication databases, and collect about ~3K results from these databases on automated query reformulations.
Then we remove the impurities from the results. Sometimes, keyword matching can produce unexpected results.
For example, these results contain studies from database management systems, multimedia retrieval or image retrieval.
Since we are looking for query reformulation for code search, we only keep results for code search, and discard the rest.
This step provides ~2300 results. That is still huge.
Then we filter the results by title and abstract. That is, we look at the title and abstract, and determine whether they are related to code search and query reformulations or not.
These steps provide us 195 results.
Then, we do the merging and duplication removal. Still the topics of a few results were not clear to us.
We thus read their full texts, especially the Introduction part.
Finally, we reach to a collection of 56 studies after all these filtration steps.
We call these studies as the primary studies on automated query reformulations.
We answer 7 research questions in our systematic survey.
We answer three general questions about methodology, evaluation and challenges/limitations from the existing literature.
We answer one statistical question.
Then we answer three specialized questions including future research opportunities.
In the first research question, we identify which underlying algorithms and methodologies are used by the existing literature.
In order to do that, we used Grounded theory approach. It is a well known method for qualitative research.
How do we do that?
Well, we read the Introduction and methodology section of each of the 56 primary studies, and identify the algorithms and technology used.
Since we are trying to develop a theory about the existing literature, we apply three types of coding in the Grounded Theory.
--Open coding: In this stage, we describe each study with a list of appropriate key phrases. The idea is to keep an open mind, and use as much as key phrases possible.
-- Axial coding: In this stage, we try to make connections among different key phrases, and color code similar phrases. We basically look for topical similarity.
-- Selective coding: In this stage, we develop the underlying variables for a dependent variable.
Thus, we develop a mental model about existing literature based on our qualitative analysis of the primary studies.
Then we do various quantitative analysis using this theory.
For example, we discover that seven major methodologies and algorithms are employed in the query reformulations.
About 40% of the studies use term weighting approaches such as TF-IDF.
About 30% studies use thesaurus such as WordNet for query expansions with synonyms.
Besides 50% studies employ various advanced heuristics and ad hoc methods for query reformulation.
We also identify that 70% studies do query expansion, which is the highest.
There are also 15%--25% studies do query reduction or replacement.
Majority of the approaches do not collect any feedback during query reformulation.
We also discover that 40% literature on query reformulation target Internet-scale code search.
The remaining studies target various code searches within a local codebase for bug localization, concept location and feature location.
In the second research question, we investigate which evaluation and validation approaches were used by the existing literature on query reformulations.
We found that 50% studies used more than two performance metrics.
50% studies used at least 2 subject systems for their experiments.
About 38% studies involve developers in their experiments, and 50% of them use less than 16 developers.
Most of the studies used some means to validate their work. 50% of the studies compare with at least 2 existing works.
In term of search queries, we found that 50% studies used at least 74 queries for their evaluation and validation.
Now, we see that the subject systems and validation targets are not sufficient, which often lead to the lack of generalizability issues.
In the third research question, we identify the threats and limitations of the existing literature.
For doing that, we consult with the methodology and threats to validity section of each of the primary studies.
In particular, we check the threats or issues reported by the authors, and identify several issues through inferences.
Like RQ1, we apply Grounded Theory approach, and identify the common challenges, issues and limitations of the existing literature.
We found seven major challenges/limitations with the existing literature. Now, the details are in the report. We are just providing the summary here.
We see that 80% studies suffer from one or more generalizability issues.
That is, they either use only subject systems from a single programming platform. For example, the findings from Java-based systems might not always generalize for C-based systems.
The number of queries or developers involved is not sufficient, or the validation is not sufficient enough.
We also see that 50% studies are affected by human bias and suffer from weak evaluation.
We also found that vocabulary mismatch problem is not solved. Well, this is a long standing problem in any type of document search, and all query reformulation approaches attempt to solve this problem. But found that 30% studies do a poor job in doing so.
We also see that 30% studies impose extra cognitive burdens on the developers during query reformulation/code search.
In the fourth research question, we find out the statistics on the research activities conducted on automated query reformulation for the last 15 years.
We see that first work on query reformulation targeting concept location was published at 2004.
Then there was some moderate activities. However, for the last 5-6 years, we see significant activities by the community.
Especially, since 2013, we see a major interest on this domain.
In terms of venue, we see that ASE and ICSE are pioneer which are A/A* conferences. So, yes, these are top-quality researches.
In fact 70% of the primary studies were done in the last 5 years, which shows the promising aspect of this domain.
These are some top authors. According to our investigation, about 150 researches have worked or working in this domain.
So, this is a well established and promising area for research.
Besides these analysis, we did more in-depth analysis, and compare between Local and Internet-scale code search in RQ5.
We see that local code searches involve term weighting for keyword selection, since it has the bug reports.
On the contrary, people use thesaurus for query expansion for code search on the web. No bug reports are there.
More details can be found on the comprehensive paper.
We also see that queries in the Internet-scale code search suffer more from vocabulary mismatch issues.
In this case, the developers do not have materials to get the help for queries.
So, they generally guess some keywords that define their information needs, which are often not sufficient.
This leads to vocabulary mismatch issue.
In the sixth research questions, we see that
Term weighting has some connection with vocabulary mismatch problem.
Inappropriate term weighting can choose inappropriate keywords for query reformulation.
This leads of noise in the query and degraded performance.
On the contrary, thesaurus and term-query co-occurrence attempt to deliver synonyms or similar words.
They create less vocabulary mismatch issues comparatively.
OK! Now we are done with the literature survey.
Now, we will focus on the third part, the future research opportunities.
Let us see an example.
This is a bug report, this is title and this is the description.
Now, developer JOE would use this bug report to localize the bug from source code.
Now he chose some ad hoc queries.
Which one is the best do you think, here? PAUSE!
Well, lets see. This one returns the correct result at this position. That means, the developer needs to check 1300+ results b4 reaching to the correct result he tries this query.
… oh… this one is the best.
So, selecting appropriate keywords from the bug report is not that simple.
Now, this is a metric which has been on the play from last the century. It was proposed in the 70s.
It is a good metric, but it was actually proposed for regular texts such as news articles.
On the other hand, we are dealing with source code here.
Now, regular texts and source code have different semantics and different structures.
They are not the same
So, metrics for regular texts are not appropriate for the source code– this is our hypothesis.
That is, each of three people, customer, past developer and JOE have their own vocabulary to describe a certain problem/concept.
In fact, any people will discuss the same problem with the same vocabulary, this probability is only 15%-20%
So, naturally, developer JOE finds it a great challenge to make a connection between bug report and the buggy code.
This costs development time, money and valuable efforts.
Here we see that burger is close sandwich. Why? They are eaten together. I do that all the time.
Well, that is not the case.
They are mentioned in the similar contexts by the people across the whole corpus.
The model recognizes such occurrences and thus put burger and sandwich close together.
Similarly, dumpling and ramen are close to each other.
Now, we propose this. This is original query, and this is reformulated query.
Now, a good reformulated query will cluster together the original query.
A bad reformulated query will NOT be able to cluster with the original query.
So, clustering tendency within the hyperspace is our weapon here.
We calculated Hopkins statistic and Polygon Area for calculating the clustering tendency.
Now, I am not going to discuss those studies in details.
But here is the glimpse.
Developers generally look for relevant code on the web using natural language query.
Please note that we are not talking about simply web search, rather talking about source code repository such as GitHub.
Now, GitHub provides this result. Now, you see it tries to match the query keywords with comment and identifiers.
But what we are dealing with source code right? So, we need source code friendly query for a better result.
So, we identify relevant API classes against this natural language query through extensive data mining and data analytics.
And once again, Stack Overflow is our friend in this grand challenge.
And, I am done with my talking.
Thanks a lot for your attention.
Now, I am ready to take some questions.