EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

•Download as PPTX, PDF•

1 like•779 views

Software engineers share experiences with modern technologies by means of software information sites, such as Stack Overflow. These sites allow developers to label posted content, referred to as software objects, with short descriptions, known as tags. However, tags assigned to objects tend to be noisy and some objects are not well tagged. To improve the quality of tags in software information sites, we propose EnTagRec, an automatic tag recommender based on historical tag assignments to software objects and we evaluate its performance on four software information sites, StackOverflow, AskUbuntu, AskDifferent and FreeCode. We observe that that EnTagRec achieves Recall@5 scores of 0.805, 0.815, 0.88 and 0.64, and Recall@10 scores of 0.868, 0.876, 0.944 and 0.753, on StackOverflow, AskUbuntu, AskDifferent and FreeCode, respectively. In terms of Recall@5 and Recall@10, averaging across the 4 datasets, EnTagRec improves TagCombine, which is the state of the art approach, by 27.3\% and 12.9\% respectively.

Science

EnTagRec: An Enhanced Tag
Recommendation System for
Software Information Sites
Shaowei Wang, David Lo,
Bogdan Vasilescu, Alexander Serebrenik
@b_vasilescu @aserebrenik

/ department of mathematics and computer science 25-9-2014 Page 3

/ department of mathematics and computer science 25-9-2014 Page 4

/ department of mathematics and computer science 25-9-2014 Page 5

/ department of mathematics and computer science 25-9-2014 Page 6

???
/ department of mathematics and computer science 25-9-2014 Page 7

EnTagRec
/ department of mathematics and computer science 25-9-2014 Page 8

Xia et al.
MSR’13
EnTagRec TagCombine
r@5 0.805 0.595
p@5 0.346 0.221
r@5 0.815 0.568
p@5 0.358 0.251
r@5 0.88 0.675
p@5 0.369 0.278
r@5 0.64 0.639
p@5 0.382 0.381
/ department of mathematics and computer science 25-9-2014 Page 9

EnTagRec TagCombine
r@5 0.805 0.595
p@5 0.346 0.221
r@5 0.815 0.568
p@5 0.358 0.251
r@5 0.88 0.675
p@5 0.369 0.278
r@5 0.64 0.639
p@5 0.382 0.381
/ department of mathematics and computer science 25-9-2014 Page 10

EnTagRec: How have we done it?
L-LDA [Ramage et al. 2009]
tokenization,
identifier
splitting,
stop words,
stemming

I have Java
daemon which I
want to pass
shell commands.
For example… P( )?
/ department of mathematics and computer science 25-9-2014 Page 14

Actually (preprocessing…)
P( )?
Java daemon
want pass shell
command
exampl daemon
load configur
possibl
/ department of mathematics and computer science 25-9-2014 Page 15

Tags = nouns (phrases)
P( )?
Java daemon
want pass shell
command
exampl daemon
load configur
possibl
/ department of mathematics and computer science 25-9-2014 Page 16

P( | Java )
P( |daemon )
P( |shell )
…
Estimate from
the training
data
Combine to
get P for the
entire text
/ department of mathematics and computer science 25-9-2014 Page 17

Java daemon
want pass shell
command
exampl daemon
load configur
possibl
P( |…) P( |… )
P( | … )
/ department of mathematics and computer science 25-9-2014 Page 18

Supercalifragilisticexpialidocious
/ department of mathematics and computer science 25-9-2014 Page 19

Supercalifragilisticexpialidocious
0.85
/ department of mathematics and computer science 25-9-2014 Page 20

EnTagRec: How have we done it?
α*BIC + β*FIC
Train α and β

EnTagRec
• is better than BIC and FIC separately
EnTagRec BIC FIC
r@5 0.805 0.565 0.593
p@5 0.346 0.232 0.258
r@5 0.815 0.505 0.637
p@5 0.358 0.212 0.282
r@5 0.88 0.523 0.713
p@5 0.369 0.212 0.298
r@5 0.64 0.391 0.545
p@5 0.382 0.230 0.322
/ department of mathematics and computer science 25-9-2014 Page 22

EnTagRec
• is better than BIC and FIC separately
• is better than TagCombine
/ department of mathematics and computer science 25-9-2014 Page 23

So, what was
all this
about?
/ department of mathematics and computer science 25-9-2014 Page 24

/ department of mathematics and computer science 25-9-2014 Page 25

/ department of mathematics and computer science 25-9-2014 Page 26

/ department of mathematics and computer science 25-9-2014 Page 27

Viewers also liked

An empirical study of the evolution of Eclipse third-party plug-ins

Alexander Serebrenik

Regreso A Clase

mercedesherrerasaez

Compatibility Prediction of Eclipse Third-Party Plug-ins in New Eclipse Releases

Alexander Serebrenik

actionreserchpbl

puniga

Mo E Training 00 Welcome

abumaather

Assignment 1 Mhi2003 Ppt Murtaza Ali

Murtaza Ali

De Andrea Nicole James

guest346997

Arts & Crafts Expo

Colt

865 Project presentation

Ian Pollock

ไตร่ตรองงานวิจัยของฉัน

School in Phatthalung

Benevol 2013: Visualizing the complexity of software module upgrades

Alexander Serebrenik

TTT

Razvain

Regreso A Clase

mercedesherrerasaez

Starting With Microsoft Excel Itzel

Fresh Produce

865 social capital

Pbl_action_reserch

Roman Vorobyev

System7 Five Point

Lisa Bell

Viewers also liked (19)

An empirical study of the evolution of Eclipse third-party plug-ins

Regreso A Clase

Compatibility Prediction of Eclipse Third-Party Plug-ins in New Eclipse Releases

actionreserchpbl

Mo E Training 00 Welcome

Assignment 1 Mhi2003 Ppt Murtaza Ali

De Andrea Nicole James

Arts & Crafts Expo

865 Project presentation

ไตร่ตรองงานวิจัยของฉัน

Benevol 2013: Visualizing the complexity of software module upgrades

TTT

Regreso A Clase

Starting With Microsoft Excel Itzel

Fresh Produce

865 social capital

Pbl_action_reserch

Roman Vorobyev

System7 Five Point

Similar to EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

This is a course which is more suitable for students who are interested in conducting research in scientific computing. This course includes teaching on linear algebra, numerical optimization, ODEs and PDEs. You will be introduced to advanced MATLAB features, syntax and toolboxes that were not included in the introductory courses. Advanced 2D/3D plotting, graphics handling, publishing quality graphics and animations, code optimization, object oriented programming and compile MATLAB in this course. See more: https://bit.ly/2vrEDxt

Diploma In MATLAB for Scientific Computing – Edu Kite

EduKite

Programming examples for beginners

inTwentyEight Minutes

Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...

Jason Dai

Pavankumar Banakar Resume

Pavankumar Banakar

Presentation on Best E-content Award - R.D.Sivakumar

Sivakumar R D .

What Is Spring Framework In Java | Spring Framework Tutorial For Beginners Wi...

Edureka!

C interview questions and answers for freshers and experienced. Interview Questions from TCS, Infosys, Wipro, Cognizant(TCS) campus interviews. This videos covers all interview questions on Variables and Variable Types - External, Local, Global and more... Complete Questions List: Section 1 : Variables What is the difference between declaration and definition of a variable? What are the sizes of different datatypes in C Language? What is an unsigned int? 6.What is an automatic variable? 7.What are the default values for automatic variables? 9.What are external variables? 12.When are external variables initialized? 15.When are static variables initialized? 16.What is the default value of static variables? 17.What does the keyword const represent? 19.What is a static function? 20.What is a Register Variable? What are its advantages? 23.What is a block? Section 2 Expressions 35.What is Precedence? 36.What is Associativity? Section 3 Functions 38.What is the difference between declaration and definition of a function? When is a declaration not needed? 39.What are arguments and What are parameters? 40.What is the default return value from a function? Section 4 Arrays 49.How are arrays and pointers related? 51.What are Character Pointers? Section 5 Pointers 55.How are arrays and pointers related? 56.Can you change the address pointed to by an array variable? 57.What are Character Pointers? 58.What is the result of this program? Section 6 Structures 59.How do you initialize a structure? 60.What are the default values assigned to a Structure? 61.How do you pass a Structure to a function? 62.Will the values in a structure be modified when they are changed in a function? 64.What is the difference between a Structure and a Union? 65.What is dynamic memory allocation? 66.What if free is not called in the method above? Section 7 Others 67.What are Escape Characters in C Language? 68.How do you comment code in C Language? 69.What is an Enum? 70.What is a typedef? Section 8 For Loop 71.What are the different parts of a for loop? 72.Can comma be used in a for loop? 74.What is the use of a break statement in a loop? 75.What is the use of a continue in a loop? Section 9 : If Condition Section 10 : Switch Statement 85.Can a switch be used without default? Section 11 : Preprocessor Directives Section 12 : Puzzles Udemy Get our complete C Interview guide: http://www.cinterview.in/p/get-our-c-interview-video-guide.html

C interview questions and answers

inTwentyEight Minutes

Shaaban Salah - SW Team Leader

Shaaban Salah

What do we need to do in the next few years to ensure that the attack landscape for 2030 isn't the same as 2020? Better languages and frameworks have already brought substantial improvements in memory safety, eliminating whole classes of vulnerabilities caused by buffer overflows.Yet despite a major reshuffle in 2021, the OWASP top 10 remains full of things that boil down to a lack of input validation. An issue that has bedevilled tech since its inception. We're all told that we shouldn't trust the input to our programs, and that validation is our best defence. But developers get precious little help on that front from today's languages and frameworks; something that can and should change. This talk will examine a hypothetical evolution of TypeScript - ValidScript, to consider a future where input validation is baked in.

Devoxx UK 2022 - Application security: What should the attack landscape look ...

Chris Swan

Free Writing - Grammatical Error Correction System: Sequence Tagging

IRJET Journal

***** Data Science Training - https://www.edureka.co/data-science ***** This Edureka tutorial on "Data Science Training" will provide you with a detailed and comprehensive training on Data Science, the real-life use cases and the various paths one can take to become a data scientist. It will also help you understand the various phases of Data Science. Data Science Blog Series: https://goo.gl/1CKTyN http://www.edureka.co/data-science

Data Science Training | Data Science Tutorial for Beginners | Data Science wi...

Edureka!

Machine learning for java developers

Nirmal Fernando

Resume - Ramsundar K G

Ramsundar K G

Gonchik Tsymzhitov will share stories of improving the performance of DC (+Server) installations. His stories will start from simple examples to complex ones where it is necessary to trace HTTP requests to the DBMS server. At the end of the report, you will receive an approximate range of steps to improve your installation. Also, the speaker will be happy to discuss the Atlassian ecosystem and Observability tools (such as NewRelic, glowroot etc.,) as they allow you to practically proactively respond to possible incidents.

TsymzhitovGB - Jira Day

Gonchik Tsymzhitov

Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)

Christian Catalan

Bdt unit iv data analyics using r

Saravanakumar viswanathan

This Edureka Data Science Training will help you understand what is Data Science and you will learn about different Data Science components and concepts. This tutorial is ideal for both beginners as well as professionals who want to learn or brush up their Data Science concepts. Below are the topics covered in this tutorial: 1. What is Data Science? 2. Job Roles in Data Science 3. Components of Data Science 4. Concepts of Statistics 5. Power of Data Visualization 6. Introduction to Machine Learning using R 7. Supervised & Unsupervised Learning 8. Classification, Clustering & Recommenders 9. Text Mining & Time Series 10. Deep Learning To take a structured training on Data Science, you can check complete details of our Data Science Certification Training course here: https://goo.gl/OCfxP2

Data Science Training | Data Science Tutorial | Data Science Certification | ...

Edureka!

Resume

Lijian Zhou

A quick introduction to the Spring framework. Discover how to wire Java objects using Spring and dependency injection. You'll learn how to set up your system for Spring development, how to use Maven and how to write testable code with Spring. All code belonging to the course is in Github. We take a focused approach taking a deep dive into most important and common Spring features. During this tutorial, we discuss these features with examples Spring Container, Dependency, and IOC (Inversion of Control) Aspect Oriented Programming JDBC Unit Testing with JUnit Dependency Management with Maven By the end of this course, you will be able to Understand the fundamentals of the Java Spring framework Understand What the Spring framework is for Develop Java Applications, the Spring way

Spring tutorial for beginners with examples

inTwentyEight Minutes

Sparklyr: Big Data enabler for R users

ICTeam S.p.A.

Similar to EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites (20)

Diploma In MATLAB for Scientific Computing – Edu Kite

Programming examples for beginners

Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...

Pavankumar Banakar Resume

Presentation on Best E-content Award - R.D.Sivakumar

What Is Spring Framework In Java | Spring Framework Tutorial For Beginners Wi...

C interview questions and answers

Shaaban Salah - SW Team Leader

Devoxx UK 2022 - Application security: What should the attack landscape look ...

Free Writing - Grammatical Error Correction System: Sequence Tagging

Data Science Training | Data Science Tutorial for Beginners | Data Science wi...

Machine learning for java developers

Resume - Ramsundar K G

TsymzhitovGB - Jira Day

Testing Vue Apps with Cypress.io (STLJS Meetup April 2018)

Bdt unit iv data analyics using r

Data Science Training | Data Science Tutorial | Data Science Certification | ...

Resume

Spring tutorial for beginners with examples

Sparklyr: Big Data enabler for R users

More from Alexander Serebrenik

Software development is a human activity: understanding software requires und...

Alexander Serebrenik

Bots for continuous performance assessment are gaining use as a productivity tool. We discuss how and why open source projects use them and present an in-depth case study of the Nanosoldier bot used by the team behind the Julia programming language. Based on analysing the history of bot usage and interviews with developers we identify lack of a shared platform for performance measurement as an obstacle to wider adoption of performance measurement bots. To address this, we propose a prototype implementation of such a platform called PerfBot. Joint work with Florian Markusse and Philipp Leitner, presented at 5th International Workshop on Bots in Software Engineering, collocated with ICSE 2023, Melbourne Australia.

Towards Continuous Performance Assessment of Java Applications With PerfBot

Alexander Serebrenik

The intersection of ageism and sexism can create a hostile environment for veteran software developers belonging to marginalized genders. In this study, we conducted 14 interviews to examine the experiences of people at this intersection, primarily women, in order to discover the strategies they employed in order to successfully remain in the field. We identified 283 codes, which fell into three main categories: Strategies, Experiences, and Perception. Several strategies we identified, such as (Deliberately) Not Trying to Look Younger, were not previously described in the software engineering literature. We found that, in some companies, older women developers are recognized as having particular value, further strengthening the known benefits of diversity in the workforce. Based on the experiences and strategies, we suggest organizations employing software developers to consider the benefits of hiring veteran women software developers. For example, companies can draw upon the life experiences of older women developers in order to better understand the needs of customers from a similar demographic. While we recognize that many of the strategies employed by our study participants are a response to systemic issues, we still consider that, in the short-term, there is benefit in describing these strategies for developers who are experiencing such issues today. This paper is a joint work with Sterre van Breukelen, Ann Barcomb and Sebastian Baltes Preprint https://arxiv.org/abs/2302.03723

“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...

Alexander Serebrenik

Many software developers started to work from home on a short notice during the early periods of COVID-19. A number of previous papers have studied the wellbeing and productivity of software developers during COVID-19. The studies mainly use surveys based on predefined questionnaires. In this paper, we investigate the problems and joys that software developers experienced during the early months of COVID-19 by analyzing their discussions in online forum devRant, where discussions can be open and not bound by predefined survey questionnaires. The devRant platform is designed for developers to share their joys and frustrations of life. We manually analyze 825 devRant posts between January and April 12, 2020 that developers created to discuss their situation during COVID19. WHO declared COVID-19 as pandemic on March 11, 2020. As such, our data offers us insights in the early months of COVID-19. We manually label each post along two dimensions: the topics of the discussion and the expressed sentiment polarity (positive, negative, neutral). We observed 19 topics that we group into six categories: Workplace & Professional aspects, Personal & Family well-being, Technical Aspects, Lockdown preparedness, Financial concerns, and Societal and Educational concerns. Around 49% of the discussions are negative and 26% are positive. We find evidence of developers’ struggles with lack of documentation to work remotely and with their loneliness while working from home. We find stories of their job loss with little or no savings to fallback to. The analysis of developer discussions in the early months of a pandemic will help various stakeholders (e.g., software companies) make important decision early to alleviate developer problems if such a pandemic or similar emergency situation occurs in near future. Software engineering research can make further efforts to develop automated tools for remote work (e.g., automated documentation). Empirical Software Engineering 27(5): 117 (2022), presented at ICSE 2023 as part of the Journal First program.

A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...

Alexander Serebrenik

Software developers are known to experience a wide range of emotions while performing development tasks. Emotions expressed in developer communication might reflect openness of the ecosystem to newcomers, presence of conflicts, problems in the software development process or source code itself. In this talk, based on a recent work with Nicole Novielli, I present an overview of the state-of-the-art research on analysis of emotions in software engineering focusing on the studies of emotion in context of software ecosystems. To encourage further applications of emotion analysis in the industry and research we also discuss currently available emotion analysis tools and datasets as well as outline directions for future research. This is a keynote talk given at the 11th International Workshop on Software Engineering for Systems-of-Systems and Software Ecosystems (SESoS 2023), collocated with ICSE 2023 in Melbourne, Australia.

Emotion Analysis in Software Ecosystems

Alexander Serebrenik

Modern software development practices increasingly rely on third-party libraries due to the inherent benefits of reuse. However, libraries may contain security vulnerabilities that can propagate to the dependent applications. To counter this, maintainers of dependent projects should monitor their dependencies and security reports to ensure that only patched releases of the upstream applications are in use. As manual maintenance of dependencies has shown to be ineffective, several automated tools (aka bots) have been proposed to assist developers in rapidly identifying and resolving vulnerable dependencies. In this work, we focus on Dependabot, a popular bot providing security and version updates, and study developers' receptivity to its security updates in engineered and actively maintained JavaScript projects. Moreover, we carry out a fine-grained analysis of the lifecycle of every vulnerability to manifest how they are dealt with in the presence of Dependabot. Our findings show that the task of fixing vulnerable dependencies is, to a large extent, delegated to Dependabot and that developers merge the majority of security updates within several days. On the other hand, when developers do not merge a security update, they usually address the identified vulnerability manually. This approach, however, often takes up to several months which in turn could expose the projects to security issues. This paper has won the ACM Distinguished paper award at MSR 2023.

Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...

Alexander Serebrenik

Gender and Age in Software Engineering

Alexander Serebrenik

Alexander - intro

Alexander Serebrenik

Diversity and inclusion in a CS classroom

Alexander Serebrenik

Static analysis tools generate a large number of alarms that require manual inspection. In prior work, repositioning of alarms is proposed to (1) merge multiple similar alarms together and replace them by a fewer alarms, and (2) report alarms as close as possible to the causes for their generation. The premise is that the proposed merging and repositioning of alarms will reduce the manual inspection effort. To evaluate the premise, this paper presents an empirical study with 249 developers on the proposed merging and repositioning of static alarms. The study is conducted using static analysis alarms generated on C programs, where the alarms are representative of the merging vs. non-merging and repositioning vs. non-repositioning situations in real-life code. Developers were asked to manually inspect and determine whether assertions added corresponding to alarms in C code hold. Additionally, two spatial cognitive tests are also done to determine relationship in performance. The empirical evaluation results indicate that, in contrast to expectations, there was no evidence that merging and repositioning of alarms reduces manual inspection effort or improves the inspection accuracy (at times a negative impact was found). Results on cognitive abilities correlated with comprehension and alarm inspection accuracy.

An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms

Alexander Serebrenik

Static analysis tools help to detect common programming errors but generate a large number of false positives. Moreover, when applied to evolving software systems, around 95% of alarms generated on a version are repeated, i.e., they have also been generated on the previous version. Version-aware static analysis techniques (VSATs) have been proposed to suppress the repeated alarms that are not impacted by the code changes between the two versions. The alarms reported by VSATs after the suppression, called delta alarms, still constitute 63% of the tool-generated alarms. We observe that delta alarms can be further postprocessed using their corresponding code changes: the code changes due to which VSATs identify them as delta alarms. However, none of the existing VSATs or alarms postprocessing techniques postprocesses delta alarms using the corresponding code changes. Based on this observation, we use the code changes to classify delta alarms into six classes that have different priorities assigned to them. The assignment of priorities is based on the type of code changes and their likelihood of actually impacting the delta alarms. The ranking of alarms, obtained by prioritizing the classes, can help suppress alarms that are ranked lower, when resources to inspect all the tool-generated alarms are limited. We performed an empirical evaluation using 9789 alarms generated on 59 versions of seven open source C applications. The evaluation results indicate that the proposed classification and ranking of delta alarms help to identify, on average, 53% of delta alarms as more likely to be false positives than the others.

Classification and Ranking of Delta Static Analysis Alarms

Alexander Serebrenik

Recently, the job market for Artificial Intelligence (AI) engineers has exploded. Since the role of AI engineer is relatively new, limited research has been done on the requirements as set by the industry. Moreover, the definition of an AI engineer is less established than for a data scientist or a software engineer. In this study we explore, based on job ads, the requirements from the job market for the position of AI engineer in The Netherlands. We retrieved job ad data between April 2018 and April 2021 from a large job ad database, Jobfeed from TextKernel. The job ads were selected with a process similar to the selection of primary studies in a literature review. We characterize the 367 resulting job ads based on meta-data such as publication date, industry/sector, educational background and job titles. To answer our research questions we have further coded 125 job ads manually. The job tasks of AI engineers are concentrated in five categories: business understanding, data engineering, modeling, software development and operations engineering. Companies ask for AI engineers with different profiles: 1) data science engineer with focus on modeling, 2) AI software engineer with focus on software development, 3) generalist AI engineer with focus on both models and software. Furthermore, we present the tools and technologies mentioned in the selected job ads, and the soft skills. Our research helps to understand the expectations companies have for professionals building AI-enabled systems. Understanding these expectations is crucial both for prospective AI engineers and educational institutions in charge of training those prospective engineers. Our research also helps to better define the profession of AI engineering. We do this by proposing an extended AI engineering life-cycle that includes a business understanding phase. Joint work with Marcel Meesters and Petra Heck.

What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands

Alexander Serebrenik

Community smells are patterns indicating suboptimal organization and communication of software development teams that have been shown to be related to suboptimal organisation of the source code. Given a long standing association of women and communication mediation, we have conducted a series of studies relating gender diversity to community smells, as well as comparing the results of the data analysis with developers' perception. To get further insights in the relation bwteen gender and community smells, we replicate our study focusing on the Brazilian software teams; indeed, culture-specific expectations on the behavior of people of different genders might have affected the perception of the importance of gender diversity and refactoring strategies when mitigating community smells. Finally, we extend the prediction model by including variables related to national diversity and see how the interplay between national diversity and gender diversity influences presence of community smells. This talk is based on a series of papers published in 2019-2022 and co-authored with Gemma Catolino, Filomena Ferrucci, Stefano Lambiase, Tiago Massoni, Fabio Palomba, Camila Sarmento, and Damian Andrew Tamburri.

Gender and Community Smells

Alexander Serebrenik

Bias in MSR Research

Alexander Serebrenik

From team organisation to software quality

Alexander Serebrenik

Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...

Alexander Serebrenik

My research story (presentation at ICSE 2021 New Faculty Symposium)

Alexander Serebrenik

In 2003 Dave et al. have coined the term “opinion mining” to refer to “processing a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good)”. Nine years later, in 2012 Brooks and Swigger have applied sentiment analysis in the context of software engineering. Today another nine years have passed and it is time to look back: what have we achieved as a research community and where should we go next? To answer this question we conducted a systematic literature review involving 185 papers. Based on the literature review we present 1) well-defined categories of opinion mining-related software development activities, 2) available opinion mining approaches, whether they are evaluated when adopted in other studies, and how their performance is compared, 3) available datasets for performance evaluation and tool customization, and 4) concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques. The results of our study serve as references to choose suitable opinion mining tools for SE tasks, and provide critical insights for the further development of opinion mining techniques in the SE domain. This work has been done together with Bin Lin, Gabriele Bavota and Michele Lanza from Università della Svizzera italiana, Switzerland, Nathan Cassee from Eindhoven University of Technology, The Netherlands and Nicole Novielli from University of Bari, Italy.

Opinion Mining for Software Engineering

Alexander Serebrenik

Removing Self Admitted Technical Debt

Alexander Serebrenik

Gender Diversity and Inclusion and Software Engineering

Alexander Serebrenik

More from Alexander Serebrenik (20)

Software development is a human activity: understanding software requires und...

Towards Continuous Performance Assessment of Java Applications With PerfBot

“STILL AROUND”: Experiences and Survival Strategies of Veteran Women Software...

A Qualitative Study of Developers’ Discussions of Their Problems and Joys Dur...

Emotion Analysis in Software Ecosystems

Investigating the Resolution of Vulnerable Dependencies with Dependabot Secur...

Gender and Age in Software Engineering

Alexander - intro

Diversity and inclusion in a CS classroom

An Empirical Assessment on Merging and Repositioning of Static Analysis Alarms

Classification and Ranking of Delta Static Analysis Alarms

What Is an AI Engineer? An Empirical Analysis of Job Ads in The Netherlands

Gender and Community Smells

Bias in MSR Research

From team organisation to software quality

Women in Dutch Computer Science: Best Practices for Recruitment, Onboarding a...

My research story (presentation at ICSE 2021 New Faculty Symposium)

Opinion Mining for Software Engineering

Removing Self Admitted Technical Debt

Gender Diversity and Inclusion and Software Engineering

Recently uploaded

Marine and terrestrial biogeochemical models are key components of the Earth System Models (ESMs) used toproject future environmental changes. However, their slow adjustment time also hinders effective use of ESMsbecause of the enormous computational resources required to integrate them to a pre-industrial equilibrium. Here,a solution to this "spin-up" problem based on "sequence acceleration", is shown to accelerate equilibration of state-of-the-art marine biogeochemical models by over an order of magnitude. The technique can be applied in a "blackbox" fashion to existing models. Even under the challenging spin-up protocols used for Intergovernmental Panelon Climate Change (IPCC) simulations, this algorithm is 5 times faster. Preliminary results suggest that terrestrialmodels can be similarly accelerated, enabling a quantification of major parametric uncertainties in ESMs, improvedestimates of metrics such as climate sensitivity, and higher model resolution than currently feasible.

Efficient spin-up of Earth System Models usingsequence acceleration

Sérgio Sacani

GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry

Areesha Ahmad

Lipids: types, structure and important functions.

Cherry

Pteris : features, anatomy, morphology and lifecycle

Cherry

LUNULARIA -features, morphology, anatomy ,reproduction etc.

Cherry

Cyathodium bryophyte: morphology, anatomy, reproduction etc.

Cherry

Digital Dentistry.Digital Dentistryvv.pptx

MohamedFarag457087

GBSN - Microbiology (Unit 4) Concept of Asepsis

Areesha Ahmad

POGONATUM : morphology, anatomy, reproduction etc.

Cherry

Plasmid: types, structure and functions.

Cherry

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....

muralinath2

GBSN - Microbiology (Unit 3)Defense Mechanism of the body

Areesha Ahmad

Genome sequencing,shotgun sequencing.pptx

Cherry

Site specific recombination and transposition.........pdf

Cherry

Energy is the beat of life irrespective of the domains. ATP- the energy curre...

Nistarini College, Purulia (W.B) India

Module for Grade 9 for Asynchronous/Distance learning

levieagacer

Using deep archival observations from the Chandra X-ray Observatory, we present an analysis of linear X-ray-emitting features located within the southern portion of the Galactic center chimney, and oriented orthogonal to the Galactic plane, centered at coordinates l = 0.08◦ , b = −1.42◦ . The surface brightness and hardness ratio patterns are suggestive of a cylindrical morphology which may have been produced by a plasma outflow channel extending from the Galactic center. Our fits of the feature’s spectra favor a complex two-component model consisting of thermal and recombining plasma components, possibly a sign of shock compression or heating of the interstellar medium by outflowing material. Assuming a recombining plasma scenario, we further estimate the cooling timescale of this plasma to be on the order of a few hundred to thousands of years, leading us to speculate that a sequence of accretion events onto the Galactic Black Hole may be a plausible quasi-continuous energy source to sustain the observed morphology

X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney

Sérgio Sacani

Ultrasound color Doppler imaging has been routinely used for the diagnosis of cardiovascular diseases, enabling real-time flow visualization through the Doppler effect. Yet, its inability to provide true flow velocity vectors due to its one-dimensional detection limits its efficacy. To overcome this limitation, various VFI schemes, including multi-angle beams, speckle tracking, and transverse oscillation, have been explored, with some already available commercially. However, many of these methods still rely on autocorrelation, which poses inherent issues such as underestimation, aliasing, and the need for large ensemble sizes. Conversely, speckle-tracking-based VFI enables lateral velocity estimation but suffers from significantly lower accuracy compared to axial velocity measurements. To address these challenges, we have presented a speckle-tracking-based VFI approach utilizing multi-angle ultrafast plane wave imaging. Our approach involves estimating axial velocity components projected onto individual steered plane waves, which are then combined to derive the velocity vector. Additionally, we've introduced a VFI visualization technique with high spatial and temporal resolutions capable of tracking flow particle trajectories. Simulation and flow phantom experiments demonstrate that the proposed VFI method outperforms both speckle-tracking-based VFI and autocorrelation VFI counterparts by at least a factor of three. Furthermore, in vivo measurements on carotid arteries using the Prodigy ultrasound scanner demonstrate the effectiveness of our approach compared to existing methods, providing a more robust imaging tool for hemodynamic studies. Learning objectives: - Understand fundamental limitations of color Doppler imaging. - Understand principles behind advanced vector flow imaging techniques. - Familiarize with the ultrasound speckle tracking technique and its implications in flow imaging. - Explore experiments conducted using multi-angle plane wave ultrafast imaging, specifically utilizing the pulse-sequence mode on a 128-channel ultrasound research platform.

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...

Scintica Instrumentation

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor

muralinath2

Daily Lesson Log in Science 9 Fourth Quarter Physics

WILSONROMA4

Recently uploaded (20)

Efficient spin-up of Earth System Models usingsequence acceleration

GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry

Lipids: types, structure and important functions.

Pteris : features, anatomy, morphology and lifecycle

LUNULARIA -features, morphology, anatomy ,reproduction etc.

Cyathodium bryophyte: morphology, anatomy, reproduction etc.

Digital Dentistry.Digital Dentistryvv.pptx

GBSN - Microbiology (Unit 4) Concept of Asepsis

POGONATUM : morphology, anatomy, reproduction etc.

Plasmid: types, structure and functions.

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....

GBSN - Microbiology (Unit 3)Defense Mechanism of the body

Genome sequencing,shotgun sequencing.pptx

Site specific recombination and transposition.........pdf

Energy is the beat of life irrespective of the domains. ATP- the energy curre...

Module for Grade 9 for Asynchronous/Distance learning

X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor

Daily Lesson Log in Science 9 Fourth Quarter Physics

EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

1. EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites Shaowei Wang, David Lo, Bogdan Vasilescu, Alexander Serebrenik @b_vasilescu @aserebrenik

2. / department of mathematics and computer science 25-9-2014 Page 2

3. / department of mathematics and computer science 25-9-2014 Page 3

4. / department of mathematics and computer science 25-9-2014 Page 4

5. / department of mathematics and computer science 25-9-2014 Page 5

6. / department of mathematics and computer science 25-9-2014 Page 6

7. ??? / department of mathematics and computer science 25-9-2014 Page 7

8. EnTagRec / department of mathematics and computer science 25-9-2014 Page 8

9. Xia et al. MSR’13 EnTagRec TagCombine r@5 0.805 0.595 p@5 0.346 0.221 r@5 0.815 0.568 p@5 0.358 0.251 r@5 0.88 0.675 p@5 0.369 0.278 r@5 0.64 0.639 p@5 0.382 0.381 / department of mathematics and computer science 25-9-2014 Page 9

10. EnTagRec TagCombine r@5 0.805 0.595 p@5 0.346 0.221 r@5 0.815 0.568 p@5 0.358 0.251 r@5 0.88 0.675 p@5 0.369 0.278 r@5 0.64 0.639 p@5 0.382 0.381 / department of mathematics and computer science 25-9-2014 Page 10

11. EnTagRec: How have we done it?

12. EnTagRec: How have we done it? L-LDA [Ramage et al. 2009] tokenization, identifier splitting, stop words, stemming

13. EnTagRec: How have we done it?

14. I have Java daemon which I want to pass shell commands. For example… P( )? / department of mathematics and computer science 25-9-2014 Page 14

15. Actually (preprocessing…) P( )? Java daemon want pass shell command exampl daemon load configur possibl / department of mathematics and computer science 25-9-2014 Page 15

16. Tags = nouns (phrases) P( )? Java daemon want pass shell command exampl daemon load configur possibl / department of mathematics and computer science 25-9-2014 Page 16

17. P( | Java ) P( |daemon ) P( |shell ) … Estimate from the training data Combine to get P for the entire text / department of mathematics and computer science 25-9-2014 Page 17

18. Java daemon want pass shell command exampl daemon load configur possibl P( |…) P( |… ) P( | … ) / department of mathematics and computer science 25-9-2014 Page 18

19. Supercalifragilisticexpialidocious / department of mathematics and computer science 25-9-2014 Page 19

20. Supercalifragilisticexpialidocious 0.85 / department of mathematics and computer science 25-9-2014 Page 20

21. EnTagRec: How have we done it? α*BIC + β*FIC Train α and β

22. EnTagRec • is better than BIC and FIC separately EnTagRec BIC FIC r@5 0.805 0.565 0.593 p@5 0.346 0.232 0.258 r@5 0.815 0.505 0.637 p@5 0.358 0.212 0.282 r@5 0.88 0.523 0.713 p@5 0.369 0.212 0.298 r@5 0.64 0.391 0.545 p@5 0.382 0.230 0.322 / department of mathematics and computer science 25-9-2014 Page 22

23. EnTagRec • is better than BIC and FIC separately • is better than TagCombine / department of mathematics and computer science 25-9-2014 Page 23

24. So, what was all this about? / department of mathematics and computer science 25-9-2014 Page 24

25. / department of mathematics and computer science 25-9-2014 Page 25

26. / department of mathematics and computer science 25-9-2014 Page 26

27. / department of mathematics and computer science 25-9-2014 Page 27

Editor's Notes

Missing tags.
As tagging is inherently a distributed and uncoordinated process, often similar objects are tagged differently [40]. Idiosyncrasy reduces the usefulness of tags, since related objects are not linked together by a common tag and relevant information becomes more difficult to retrieve. To illustrate the importance of tags for the well functioning of a software information site, note the more than 4000 questions related to tags on the META STACK OVERFLOW, as opposed say to only the about 1000 related to user interface
Some synonyms are really strange
Stress that we have taken samples of these websites. For instance, for StackOverflow we have taken 47K questions with 437 tags… FreeCode – in addition to textual tags, such as general application domain (e.g., Software Development for Apache Ant) and more specific category within this domain (BuildTools)
Precision and recall: *averages*. Recall: average (Top5Predicted intersect Correct)/Correct with averaging over all software objects in the test set. Precision, similarly for average (Top5Predicted intersect Correct)/Top5Predicted
BIC and FIC return a set of tags with associated probabilities, e.g., B_o(t) is the probability that object o will be tagged t according to the Bayesian approach. K_Bayesian and K_frequentist are number of tags returned by each of the components (= default 70), Composer Component will return K tags (as in Recall@K, i.e., 5)
D. Ramage, D. Hall, R. Nallapati, and C. D. Manning. Labeled lda: a supervised topic model for credit attribution in multi-labeled corpora. In EMNLP ’09, pages 248–256, 2009. EMNLP = Empirical methods in natural language processing
Tag [python] stresses that tags are not necessarily coming literally from the text. This is where the weights are coming from We infer not more than K_Frequentist tags. Indeed, StackOverflow has more than 40 000 tags
Spreading activation network. Each node in the network corresponds to a tag (all tags considered for the training set), and each edge connecting two nodes in the network corresponds to the relationship between the corresponding tags. Weight of a node is initially zero. Weight of an edge is the ratio of the #objects in the training set tagged with both tags and #objects tagged with at least one tag. Set of starting tags are the tags inferred by the approach presented so far with weights as calculated above. So in this example we assume that in general six tags are considered (Java, Linux. Eclipse, Python, IDE, Project) but for a given software object the previous approach has inferred two tags: Java with probability 0.5 and Python with probability 0.6. Max hop in this example is 2, and this is why we do not update the weight of Project in the first activation.
BIC and FIC return a set of tags with associated probabilities, e.g., B_o(t) is the probability that object o will be tagged t according to the Bayesian approach. alpha and beta can be tuned for any evaluation criteria EC, we use recall@5 as used in the literature. Reminder: recall@5 is *average* (Top5Predicted intersect Correct)/Correct with averaging over all software objects in the test set.
These values are for the optimal values of K, K_Bayesian = K_Frequentist = 70. For both K’s equal to 10 the values of recall are lower but still better than those of BIC/FIC for the StackExchange datasets (recall between 70% and 80%) r@5 and p@5 values in this table are averages but we have also conducted a more profound study using Dunnett-type contrasts showing that EnTagRec indeed outperforms BIC and FIC and EnTagRec outperforms TagCombine on the StackExchange datasets

EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

Similar to EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

Similar to EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites (20)

More from Alexander Serebrenik

More from Alexander Serebrenik (20)

Recently uploaded

Recently uploaded (20)

EnTagRec: An Enhanced Tag Recommendation System for Software Information Sites

Editor's Notes