RAISE Lab at Dalhousie University
aims to develop tools and technologies for intelligent automation in software engineering. An overview is presented by Dr. Masud Rahman, Assistant Professor, Faculty of Computer Science, Dalhousie University, Canada.
The Most Used Programming Languages in 2020: Python, Java, JavaScript, C#, C++, PHP
Programming Languages Rankings: LinkedIn Jobs, GitHub, Stack Overflow, Google Trends, PYPL, IEEE, TIOBE, Jobs.bg
The Skills of the Software Engineers: Coding, Algorithms, Development Concepts, Technologies
Programming Languages: Trends for 2021
CodeMonsters 2020 conference
Created by Svetlin Nakov
The Most Used Programming Languages in 2020: Python, Java, JavaScript, C#, C++, PHP
Programming Languages Rankings: LinkedIn Jobs, GitHub, Stack Overflow, Google Trends, PYPL, IEEE, TIOBE, Jobs.bg
The Skills of the Software Engineers: Coding, Algorithms, Development Concepts, Technologies
Programming Languages: Trends for 2021
CodeMonsters 2020 conference
Created by Svetlin Nakov
In this seminar Dr. Svetlin Nakov talks about the programming languages, their popularity, available jobs and trends for 2022-2023.
Modern software development uses dozens of programming languages, along with hundreds of technology frameworks, libraries, and software tools.
This talk will review the most popular programming languages on the labor market: JavaScript, Java, C#, Python, PHP, C++, Go, Swift. It will be briefly stated what each of them is, what it is used for and what is its demand in the IT industry.
Agenda:
The Most Used Programming Languages in 2022:
- Python, Java, JavaScript, C#, C++, PHP
Jobs by Programming Languages in 2022:
- Jobs Worldwide by Programming Language
- Jobs in Bulgaria by Programming Language
Programming Languages Trends for 2023
- Language Popularity Rankings from Stack Overflow, GitHub, PYPL, IEEE, TIOBE, Etc.
Become a Software Developer: How To Start?
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis SystemsJeff Hung
There are different kinds of ways to query things. Relational (algebra), Search (engine), and Streaming (processing) are the common three. Beyond them, Graph Query is more complex and resource consuming. So that there is no commonly accepted system available for graph query currently - especially when you have petabytes of data.
In this talk, we will share how Trend Micro built the in-house graph query system for petabytes of data. Even more, together with big-data, this system can also query existing REST-based services and live analysis system at the same time. This enables researchers in Trend Micro to get the latest intelligence for threat analysis and Machine Learning modeling.
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Debdoot Mukherjee
There has been widespread interest in both academia and industry around techniques to help in fault localization. Much of this work leverages static or dynamic code analysis and hence is constrained by the programming language used or presence of test cases. In order to provide more generically applicable techniques, recent work has focused on devising text search based approaches that recommend source files which a developer can modify to fix a bug. Text search may be used for fault localization in either of the following ways. We can search a repository of past bugs with the bug description to find similar bugs and recommend the source files that were modified to fix those bugs. Alternately, we can directly search the code repository to find source files that share words with the bug report text. Few interesting questions come to mind when we consider applying these text-based search techniques in real projects. For example, would searching on past fixed bugs yield better results than searching on code? What is the accuracy one can expect? Would giving preference to code words in the bug report better the search results? In this paper, we apply variants of text-search on four open source projects and compare the impact of different design considerations on search efficacy.
The key challenge in making AI technology more accessible to the broader community is the scarcity of AI experts. Most businesses simply don’t have the much needed resources or skills for modeling and engineering. This is why automated machine learning and deep learning technologies (AutoML and AutoDL) are increasingly valued by academics and industry. The core of AI is the model design. Automated machine learning technology reduces the barriers to AI application, enabling developers with no AI expertise to independently and easily develop and deploy AI models. Automated machine learning is expected to completely overturn the AI industry in the next few years, making AI ubiquitous.
AppliFire - Low Code Rapid Application Development PlatformAjit Singh
AppliFire provides unprecedented speed to Designers, Developers and Business Analysts for creating powerful Enterprise grade Applications with rich User Interfaces, Complex Business Logic, External Interfaces, Reporting , Dashborads, Mobile Apps etc. Simultaneously, it offers very high control and flexibility to developers and also provides high quality and consistent platform independent source code with a technology stack of your choice.
Automated server-side model for recognition of security vulnerabilities in sc...IJECEIAES
With the increase of global accessibility of web applications, maintaining a reasonable security level for both user data and server resources has become an extremely challenging issue. Therefore, static code analysis systems can help web developers to reduce time and cost. In this paper, a new static analysis model is proposed. This model is designed to discover the security problems in scripting languages. The proposed model is implemented in a prototype SCAT, which is a static code analysis tool. SCAT applies the phases of the proposed model to catch security vulnerabilities in PHP 5.3. Empirical results attest that the proposed prototype is feasible and is able to contribute to the security of real-world web applications. SCAT managed to detect 94% of security vulnerabilities found in the testing benchmarks; this clearly indicates that the proposed model is able to provide an effective solution to complicated web systems by offering benefits of securing private data for users and maintaining web application stability for web applications providers.
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...Masud Rahman
Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as search queries. On the other hand, there is a piece of recent evidence that suggests that even these natural language-only reports contain enough good keywords that could help localize the bugs successfully. On one hand, these findings suggest that natural language-only bug reports might be a sufficient source for good query keywords. On the other hand, they cast serious doubt on the query selection practices in the IR-based bug localization. In this article, we attempted to clear the sky on this aspect by conducting an in-depth empirical study that critically examines the state-of-the-art query selection practices in IR-based bug localization. In particular, we use a dataset of 2,320 bug reports, employ ten existing approaches from the literature, exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from these bug reports, and then answer three research questions. We confirmed that the state-of-the-art query construction approaches are indeed not sufficient for constructing appropriate queries (for bug localization) from certain natural language-only bug reports. However, these bug reports indeed contain high-quality search keywords in their texts even though they might not contain explicit hints for localizing bugs (e.g., stack traces). We also demonstrate that optimal queries and non-optimal queries chosen from bug report texts are significantly different in terms of several keyword characteristics (e.g., frequency, entropy, position, part of speech). Such an analysis has led us to four actionable insights on how to choose appropriate keywords from a bug report. Furthermore, we demonstrate 27%–34% improvement in the performance of non-optimal queries through the application of our actionable insights to them. Finally, we summarize our study findings with future research directions (e.g., machine intelligence in keyword selection).
Preprint: https://bit.ly/39nAoun
Publication URL: https://bit.ly/3xVUxlq
Replication package: https://bit.ly/36T8oxL
More details: https://web.cs.dal.ca/~masud
In this seminar Dr. Svetlin Nakov talks about the programming languages, their popularity, available jobs and trends for 2022-2023.
Modern software development uses dozens of programming languages, along with hundreds of technology frameworks, libraries, and software tools.
This talk will review the most popular programming languages on the labor market: JavaScript, Java, C#, Python, PHP, C++, Go, Swift. It will be briefly stated what each of them is, what it is used for and what is its demand in the IT industry.
Agenda:
The Most Used Programming Languages in 2022:
- Python, Java, JavaScript, C#, C++, PHP
Jobs by Programming Languages in 2022:
- Jobs Worldwide by Programming Language
- Jobs in Bulgaria by Programming Language
Programming Languages Trends for 2023
- Language Popularity Rankings from Stack Overflow, GitHub, PYPL, IEEE, TIOBE, Etc.
Become a Software Developer: How To Start?
[DataCon.TW 2019] Graph Query on Big-data, REST API, and Live Analysis SystemsJeff Hung
There are different kinds of ways to query things. Relational (algebra), Search (engine), and Streaming (processing) are the common three. Beyond them, Graph Query is more complex and resource consuming. So that there is no commonly accepted system available for graph query currently - especially when you have petabytes of data.
In this talk, we will share how Trend Micro built the in-house graph query system for petabytes of data. Even more, together with big-data, this system can also query existing REST-based services and live analysis system at the same time. This enables researchers in Trend Micro to get the latest intelligence for threat analysis and Machine Learning modeling.
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Debdoot Mukherjee
There has been widespread interest in both academia and industry around techniques to help in fault localization. Much of this work leverages static or dynamic code analysis and hence is constrained by the programming language used or presence of test cases. In order to provide more generically applicable techniques, recent work has focused on devising text search based approaches that recommend source files which a developer can modify to fix a bug. Text search may be used for fault localization in either of the following ways. We can search a repository of past bugs with the bug description to find similar bugs and recommend the source files that were modified to fix those bugs. Alternately, we can directly search the code repository to find source files that share words with the bug report text. Few interesting questions come to mind when we consider applying these text-based search techniques in real projects. For example, would searching on past fixed bugs yield better results than searching on code? What is the accuracy one can expect? Would giving preference to code words in the bug report better the search results? In this paper, we apply variants of text-search on four open source projects and compare the impact of different design considerations on search efficacy.
The key challenge in making AI technology more accessible to the broader community is the scarcity of AI experts. Most businesses simply don’t have the much needed resources or skills for modeling and engineering. This is why automated machine learning and deep learning technologies (AutoML and AutoDL) are increasingly valued by academics and industry. The core of AI is the model design. Automated machine learning technology reduces the barriers to AI application, enabling developers with no AI expertise to independently and easily develop and deploy AI models. Automated machine learning is expected to completely overturn the AI industry in the next few years, making AI ubiquitous.
AppliFire - Low Code Rapid Application Development PlatformAjit Singh
AppliFire provides unprecedented speed to Designers, Developers and Business Analysts for creating powerful Enterprise grade Applications with rich User Interfaces, Complex Business Logic, External Interfaces, Reporting , Dashborads, Mobile Apps etc. Simultaneously, it offers very high control and flexibility to developers and also provides high quality and consistent platform independent source code with a technology stack of your choice.
Automated server-side model for recognition of security vulnerabilities in sc...IJECEIAES
With the increase of global accessibility of web applications, maintaining a reasonable security level for both user data and server resources has become an extremely challenging issue. Therefore, static code analysis systems can help web developers to reduce time and cost. In this paper, a new static analysis model is proposed. This model is designed to discover the security problems in scripting languages. The proposed model is implemented in a prototype SCAT, which is a static code analysis tool. SCAT applies the phases of the proposed model to catch security vulnerabilities in PHP 5.3. Empirical results attest that the proposed prototype is feasible and is able to contribute to the security of real-world web applications. SCAT managed to detect 94% of security vulnerabilities found in the testing benchmarks; this clearly indicates that the proposed model is able to provide an effective solution to complicated web systems by offering benefits of securing private data for users and maintaining web application stability for web applications providers.
Similar to HereWeCode 2022: Dalhousie University (20)
The Forgotten Role of Search Queries in IR-based Bug Localization: An Empiric...Masud Rahman
Being light-weight and cost-effective, IR-based approaches for bug localization have shown promise in finding software bugs. However, the accuracy of these approaches heavily depends on their used bug reports. A significant number of bug reports contain only plain natural language texts. According to existing studies, IR-based approaches cannot perform well when they use these bug reports as search queries. On the other hand, there is a piece of recent evidence that suggests that even these natural language-only reports contain enough good keywords that could help localize the bugs successfully. On one hand, these findings suggest that natural language-only bug reports might be a sufficient source for good query keywords. On the other hand, they cast serious doubt on the query selection practices in the IR-based bug localization. In this article, we attempted to clear the sky on this aspect by conducting an in-depth empirical study that critically examines the state-of-the-art query selection practices in IR-based bug localization. In particular, we use a dataset of 2,320 bug reports, employ ten existing approaches from the literature, exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from these bug reports, and then answer three research questions. We confirmed that the state-of-the-art query construction approaches are indeed not sufficient for constructing appropriate queries (for bug localization) from certain natural language-only bug reports. However, these bug reports indeed contain high-quality search keywords in their texts even though they might not contain explicit hints for localizing bugs (e.g., stack traces). We also demonstrate that optimal queries and non-optimal queries chosen from bug report texts are significantly different in terms of several keyword characteristics (e.g., frequency, entropy, position, part of speech). Such an analysis has led us to four actionable insights on how to choose appropriate keywords from a bug report. Furthermore, we demonstrate 27%–34% improvement in the performance of non-optimal queries through the application of our actionable insights to them. Finally, we summarize our study findings with future research directions (e.g., machine intelligence in keyword selection).
Preprint: https://bit.ly/39nAoun
Publication URL: https://bit.ly/3xVUxlq
Replication package: https://bit.ly/36T8oxL
More details: https://web.cs.dal.ca/~masud
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
3. Real Life Software Bugs & Failures
$1.7 trillion/year
(Global, 2017)
Software bug is a fault/error/flaw in the program that
causes the program to behave unexpectedly ---Wikipedia
4. A Tale of Software Bugs & Features!
Find the bug
Understand the bug
Repair the bug/faulty code
Find the right code for a feature
Quality control of code-level changes
1
2
3
4
7. Search Keyword Selection from Trace Graph
7
)
(
)
1
0
(
|
)
(
|
)
(
)
1
(
)
(
i
v
In
j
j
j
i
v
Out
v
S
v
S
Ci
Cj
Mk
Mn
Cp
PageRank Algorithm
(Google)
ESEC/FSE 2018
8. Find the bug using Information Retrieval
JDIValue, toString, execute,
EvaluationThread, run, NullPointerException
able cast null
Keyword
selection
127 Words
1
9. Explain the bug (a.k.a., faulty software code)
• Rule-based explanation
• Not accurate
• Cryptic, hard to understand
10. Example 2: Explain a bug with GitHub
Bug-fix pull
requests
10
Faulty code
Commit messages
11. Explain a bug with regular texts
Abstract Syntax Tree (AST)
Convert input dense tensor
Deep learning
Message
12. 12
BufferedImage Grayscale ImageEdit ColorConvertOp File Transparency
ColorSpace BufferedImageOp Graphics ImageEffects
Convert image to gray scale without losing transparency
Example 3: Find the right software code
13. Candidate API Collection from Stack Overflow
13
PageRank TF-IDF
IndexColorModel
ColorSpaceType
BufferedImageOp
Gray
ImageEffects
JPEGResize
Color
IOException
Graphics
ColorConvertOp
Relevant Q&As
Code
Elements
Candidate
API List
Candidate
API List
Convert image to
Grayscale ….
ICSME 2018
14. Relevant API Selection with Borda Count
14
BORDA Count: A>B
if ∑rank(A) > ∑rank(B)
Borda Score
Candidate APIs by
PageRank
Candidate APIs by
TF-IDF
B: Donald
A: Joe
15. Relevant API Selection with Word Embedding
15
Semantic Proximity: A>B
if proximity(Q,A) > proximity(Q,B)
Semantic
Proximity Score
Query
Candidate
API List
1.4M
16. Impact of NLP2API on Search Query
16
Convert image to gray scale without losing transparency 115
BufferedImage Grayscale ImageEdit ColorConvertOp File Transparency
ColorSpace BufferedImageOp Graphics ImageEffects
02
Convert image to gray scale without losing transparency
17. Masud Rahman, PhD
Assistant Professor
Faculty of Computer Science
Office: 218, Goldberg CS Building
Dalhousie University, Canada
masud.rahman@dal.ca
Interested in more details? please visit: https://web.cs.dal.ca/~masud/raise
18. Relevant API Selection with Word Embedding
18
Semantic Proximity: A>B
if proximity(Q,A) > proximity(Q,B)
Semantic
Proximity Score
Query
Candidate
API List
1.4M
19. Query Expansion with Relevant API Classes
19
Borda Score Semantic Proximity
Score
Initial Query Expanded Query
Ranked API
Classes
Editor's Notes
This is my academic journey.
I completed my undergrad from Khulna University back in 2009.
Then I came to Canada back in 2012 for my graduate studies. I completed my Masters and PhD from University of Saskatchewan.
Then in 2019, I moved to Polytechnique Montreal as a postdoctoral fellow.
This is an example bug report!
It talks about a software bug!
Now, the bug is hidden somewhere in the software code.
So, the process of finding that buggy code is called bug localization.
This is an example for noisy bug report.
It contains stack trace information.
It contains five hundred keywords.
That means too many signals. It is hard to separate the signals from the noise.
That means, the bug report does not work well as a search query.
So, how to solve this problem? How can we make a query and find out the bug?
Once we have the graph, we use a graph-based algorithm called PageRank for keyword selection.
Now, this is a recursive algorithm, and its a bit complex.
But I will try to explain it with this diagram.
So, the algorithm is based on voting mechanism.
That is, if a node gets enough votes from other important nodes, that this node is likely to be important.
So, why this guy is bigger and laughing? Because, he is being voted by other important nodes.
So, this is a recursive process. Once the computation is over, we get a ranked list of nodes.
That means, from the trace graph, we get a ranked list of method and class names.
But how does it help? Lets see.
Now, let me show you how Information Retrieval-based bug localization works.
This is an example bug report, and we want to find out this buggy code from the codebase.
Now, what the existing approaches were doing?
They tried to consider this whole bug report as a query
Then they submit this query to a code search engine which is sitting on top of our codebase.
When I say search engine, I mean the local search engines like Lucene.
Now this ad hoc query returns the buggy code at 53rd position. That means the developer needs to check 52 non-buggy code before reaching the buggy code, which is time-consuming and not good for developer productivity.
However, if we look closely, we see that this bug report contains 127 words.
If we can carefully choose these keywords and use them as a query, we can retrieve the buggy code at the topmost position, which is exactly what we want, right?
Now obviously, identifying these keywords is extremely challenging, which makes it our first research problem.
So, we reduce the bug localization problem into a keyword selection problem ☺
Developers search for software bugs and features within a local codebase.
However, searching within a local codebase might not be enough. They need to search on the web.
Study shows that they spend 20% of their time for code search on the web.
So, lets say the developer is implementing a software feature and he/she needs the code that can convert an image to gray scale without losing the transparency
Now as a standard practice, the develop makes this natural language query, and submits the query to GitHub, the largest code repository on the web.
Now, GitHub provides this result. But as you see, it does not look very relevant.
The developer is looking for something like this.
If we look carefully, we see that it is pretty hard to retrieve this code with this query. Because, there is not enough keyword matching.
But if we can replace this natural language query with these relevant API classes, then we can get lots of keyword matching and we can easily retrieve this code.
But as you can imagine, transforming the NL query into these relevant API classes could be very challenging!
So, this makes it our third research problem.
And once again, Stack Overflow is our friend in this grand challenge.
First, we submit the query to Stack Overflow, which returns a list of relevant Q&A threads.
What is a Q&A thread?
Well, here is an example. This is a question, and this is the answer.
We also see that the answer contain several program elements such as API classes.
So, what we do? We capture these program elements using regular expressions.
Then we use two keyword selection algorithms to make two list of candidate API classes.
But then what? We use two more items to detect the most relevant API classes from these lists.
The essence of Borda count is -- If API A is more frequent than API B in the relevant Q & A threads from Stack Overflow, A is more appropriate than B.
So, it’s a kind of likelihood of A over B for the target query.
For the second metric, we preprocess Stack Overflow corpus, develop a Skip-gram model using FastText, an improved version of Word2Vec.
Then we determine, how close an API is to the given query keywords within the semantic space.
So, we A is more semantically close to query Q than B, then A is more appropriate than B for the query.
So, we then combine these two metrics for each candidate API class, do the ranking, and return the Top-K classes as our reformulation terms.
Since we have two candidate lists, we use Borda count to find the most relevant API classes.
Now, this is how it works?
If Bernie wins according to multiple polls, he is a better candidate.
Similarly if API A ranks higher in multiple ranked list than API B, then API A is more relevant in our problem context.
So, yes, this is how, we get the Borda score for all the candidates.
We also use another way to find out the most relevant API classes, this is called semantic proximity.
For doing that, first, we collect 1.4 million Q&A threads from Stack Overflow and make a corpus.
We preprocess them and feed them to FastText.
Now, FastText is a neural text classifier tool that is basically a 3-layer neural network.
What it does is, it transforms the corpus into a semantic space.
Now, what is a semantic space?
Now, this is an example of semantic space for food names.
Here we see that Ramen is closer to Spaghetti than burger. Well, if you tasted these foods, then it makes sense, right?
Well, similarly, we create semantic space for API classes and keywords and determine their semantic proximity.
Finally, we get the semantic proximity score for each API class.
So, here is the result.
If we use only the natural language query, we can retrieve this code example at bottom of the list.
But when we use the query our tool, it returns the same relevant code at the 2nd position, which is really interesting.
More extensive experiments could be found in the paper.
We also use another way to find out the most relevant API classes, this is called semantic proximity.
For doing that, first, we collect 1.4 million Q&A threads from Stack Overflow and make a corpus.
We preprocess them and feed them to FastText.
Now, FastText is a neural text classifier tool that is basically a 3-layer neural network.
What it does is, it transforms the corpus into a semantic space.
Now, what is a semantic space?
Now, this is an example of semantic space for food names.
Here we see that Ramen is closer to Spaghetti than burger. Well, if you tasted these foods, then it makes sense, right?
Well, similarly, we create semantic space for API classes and keywords and determine their semantic proximity.
Finally, we get the semantic proximity score for each API class.
Now, we have two scores for each class.
What we do?
We combine them, rank them, and then collect top few API classes.
Then we add them to the natural language query to get the expanded search query.